We help IT Professionals succeed at work.
Private
Troubleshooting Question

Python script

Star79
Star79 asked
on
66 Views
Last Modified: 2020-08-25
Hello Iam trying to write the python script to mask the data in a file.
The file content is as below:*BEG IAZZ/RUSAZZA/R
A0APLUS4730003504437  EA00015W56BPZ0118S020RAB24D0AVE ABIPV201072J    118       SJUUY210001066203
*END
My masking definition is
maskingDef = {'a':'j',
           'b':'k',
           'c':'l',
           'd':'m',
           'e':'n',
           'f':'o',
           'g':'p',
           'h':'q',
           'i':'r',
           'j':'s',
           'k':'t',
           'l':'u',
           'm':'v',
           'n':'w',
           'o':'x',
           'p':'y',
           'q':'z',
           'r':'A',
           's':'B',
           't':'C',
           'u':'D',
           'v':'E',
           'w':'F',
           'x':'G',
           'y':'H',
           'z':'I',
           'A':'J',
           'B':'K',
           'C':'L',
           'D':'M',
           'E':'N',
           'F':'O',
           'G':'P',
           'H':'Q',
           'I':'R',
           'J':'S',
           'K':'T',
           'L':'U',
           'M':'V',
           'N':'W',
           'O':'X',
           'P':'Y',
           'Q':'Z',
           'R':'a',
           'S':'b',
           'T':'c',
           'U':'d',
           'V':'e',
           'W':'f',
           'X':'g',
           'Y':'h',
           'Z':'i',
           '1':'4',
           '2':'5',
           '3':'6',
           '4':'7',
           '5':'8',
           '6':'9',
           '7':'0',
           '8':'1',
           '9':'2',
           '0':'3'}



How do I  find the segment A0A and mask the letters from column 8 to 22 so for the example above anything from 4 to space after 7 that is 8 to 22.
How to achieve this in python 3.8.5.I only need the piece of code that will find the segment and identify the fields 8 to 22.
Comment
Watch Question

NorieAnalyst Assistant
CERTIFIED EXPERT

Commented:
This should work with the data you posted in a file named 'data.txt.
with open('data.txt', 'r') as file:
    print(file)
    lines = file.readlines()

for idx,line in enumerate(lines):
    if line.startswith('A0A'):
        print(idx)
        pos = lines[idx].find(' ',0)

        masked = [maskingDef[chr] for chr in  lines[1][7:pos]]

        lines[idx] = lines[idx][0:7]+''.join(masked)+lines[1][pos:]

print(lines)

with open('masked.txt', 'w') as file:
    file.writelines(lines)

Open in new window

Author

Commented:
I was trying to plugin your code above into mine...
import re
import os

#'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890'
#'jklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi4567890123'
  
maskingDef = {'a':'j',
           'b':'k',
           'c':'l',
           'd':'m',
           'e':'n',
           'f':'o',
           'g':'p',
           'h':'q',
           'i':'r',
           'j':'s',
           'k':'t',
           'l':'u',
           'm':'v',
           'n':'w',
           'o':'x',
           'p':'y',
           'q':'z',
           'r':'A',
           's':'B',
           't':'C',
           'u':'D',
           'v':'E',
           'w':'F',
           'x':'G',
           'y':'H',
           'z':'I',
           'A':'J',
           'B':'K',
           'C':'L',
           'D':'M',
           'E':'N',
           'F':'O',
           'G':'P',
           'H':'Q',
           'I':'R',
           'J':'S',
           'K':'T',
           'L':'U',
           'M':'V',
           'N':'W',
           'O':'X',
           'P':'Y',
           'Q':'Z',
           'R':'a',
           'S':'b',
           'T':'c',
           'U':'d',
           'V':'e',
           'W':'f',
           'X':'g',
           'Y':'h',
           'Z':'i',
           '1':'4',
           '2':'5',
           '3':'6',
           '4':'7',
           '5':'8',
           '6':'9',
           '7':'0',
           '8':'1',
           '9':'2',
           '0':'3'}

toMask = ""

def findnth(string, substring, n):
    parts = string.split(substring, n + 1)
    if len(parts) <= n + 1:
        return -1
    return len(string) - len(parts[-1]) - len(substring)

def maskString(toMask):
    maskedString = ""
    for char in toMask:
        if char in maskingDef:
            maskedString += maskingDef[char]
        else :
            maskedString += char
    return maskedString

def findString(replaceLine,separator,startOccurence,endOccurence):
    startIndex = findnth(replaceLine, separator, startOccurence) 
    endIndex = findnth(replaceLine, separator, endOccurence)   
    
    if startIndex >= 0 :
        if endIndex >=0 :
            return replaceLine[startIndex+1:endIndex]
        else :
            return replaceLine[startIndex+1:len(replaceLine)]
    else : 
        return ""

def findAndMask(replaceLine,separator,startOccurence,endOccurence):
    toMask = findString(replaceLine,separator,startOccurence,endOccurence)
    if toMask != "" :
        return replaceLine.replace(toMask, maskString(toMask))
    else : 
        return replaceLine
        
directory = "c:\\temp\\A0A"
for entry in os.scandir(directory):
    if entry.is_file():
        
        # A0A
        with open(entry.path) as f:
            # Store all lines as list
            data = f.readlines()
            masked = []

for idx,line in enumerate(data):
        if line.startswith('A0A'):
            print(idx)
            pos = data[idx].find(' ',0)

            masked = [maskingDef[chr] for chr in  data[1][7:pos]]

            data[idx] = data[idx][0:7]+''.join(masked)+data[1][pos:]

#print(data)
        else :
            masked.append(line)




with open(directory+"\\masked\\"+entry.name, 'w') as f:
    for item in masked:
        f.write(item)
Noticed it did not replace the part 8 to 22 in the line but rather it put a new line , how do I replace the existing text in the file...Some code changes above which will help
NorieAnalyst Assistant
CERTIFIED EXPERT

Commented:
The code I posted shouldn't add any new lines to the file.

Then again I was basing it solely on the data you posted so there's probably more going on in the 'real' scenario

For example, you appear to be dealing with multiple files, is that right?

Author

Commented:
Thats right Norie, the folder can have multiple files.

Author

Commented:
The output of the masked file is
7063336837760*END
but it should be masking the 8 through 22 in the line
*BEG IAZZ/RUSAZZA/R
A0APLUS4730003504437  EA00015W56BPZ0118S020RAB24D0AVE ABIPV201072J    118       SJUUY210001066203
*END

NorieAnalyst Assistant
CERTIFIED EXPERT

Commented:
In your original post you seem to indicate that this was the part of the file that you wanted masked.

4730003504437

Is that incorrect?

Author

Commented:
yes you are right it did mask the text below to 7063336837760*END

4730003504437
But how can I achieve that it should mask and retain the other text as well
NorieAnalyst Assistant
CERTIFIED EXPERT

Commented:
I'm not sure I follow.

The code I posted will read from a file named 'data.txt' and will write to a file named 'masked.txt' with 4730003504437 masked, but all the other data unchanged.

If that isn't what you are looking for please supply some more information.

Author

Commented:
iam kind of new to python, does the code you posted takes colmn 8 to 22 characters? Which part of the code does that please let me know

Author

Commented:
I made some modifications to have it code as
import re
import os

#'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890'
#'jklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi4567890123'
  
maskingDef = {'a':'j',
           'b':'k',
           'c':'l',
           'd':'m',
           'e':'n',
           'f':'o',
           'g':'p',
           'h':'q',
           'i':'r',
           'j':'s',
           'k':'t',
           'l':'u',
           'm':'v',
           'n':'w',
           'o':'x',
           'p':'y',
           'q':'z',
           'r':'A',
           's':'B',
           't':'C',
           'u':'D',
           'v':'E',
           'w':'F',
           'x':'G',
           'y':'H',
           'z':'I',
           'A':'J',
           'B':'K',
           'C':'L',
           'D':'M',
           'E':'N',
           'F':'O',
           'G':'P',
           'H':'Q',
           'I':'R',
           'J':'S',
           'K':'T',
           'L':'U',
           'M':'V',
           'N':'W',
           'O':'X',
           'P':'Y',
           'Q':'Z',
           'R':'a',
           'S':'b',
           'T':'c',
           'U':'d',
           'V':'e',
           'W':'f',
           'X':'g',
           'Y':'h',
           'Z':'i',
           '1':'4',
           '2':'5',
           '3':'6',
           '4':'7',
           '5':'8',
           '6':'9',
           '7':'0',
           '8':'1',
           '9':'2',
           '0':'3'}

toMask = ""

def findnth(string, substring, n):
    parts = string.split(substring, n + 1)
    if len(parts) <= n + 1:
        return -1
    return len(string) - len(parts[-1]) - len(substring)

def maskString(toMask):
    maskedString = ""
    for char in toMask:
        if char in maskingDef:
            maskedString += maskingDef[char]
        else :
            maskedString += char
    return maskedString

def findString(replaceLine,separator,startOccurence,endOccurence):
    startIndex = findnth(replaceLine, separator, startOccurence) 
    endIndex = findnth(replaceLine, separator, endOccurence)   
    
    if startIndex >= 0 :
        if endIndex >=0 :
            return replaceLine[startIndex+1:endIndex]
        else :
            return replaceLine[startIndex+1:len(replaceLine)]
    else : 
        return ""

def findAndMask(replaceLine,separator,startOccurence,endOccurence):
    toMask = findString(replaceLine,separator,startOccurence,endOccurence)
    if toMask != "" :
        return replaceLine.replace(toMask, maskString(toMask))
    else : 
        return replaceLine
        
directory = "c:\\temp\\A0A"
for entry in os.scandir(directory):
    if entry.is_file():
        
        # A0A
        with open(entry.path) as f:
            # Store all lines as list
            data = f.readlines()
            masked = []

for idx,line in enumerate(data):
        if line.startswith('A0A'):
            print(idx)
            pos = data[idx].find(' ',0)

            masked = [maskingDef[chr] for chr in  data[1][7:pos]]

            data[idx] = data[idx][0:7]+''.join(masked)+data[1][pos:]

            print(data)
        else :
            masked.append(data)




with open(directory+"\\masked\\"+entry.name, 'w') as file:
    file.writelines(data)

Author

Commented:
How do I keep the file name as the source file say data.text but it should be masked and placed under the masked folder but with the same file name data.txt .. thats the issues Iam having
NorieAnalyst Assistant
CERTIFIED EXPERT

Commented:
So you want to take the files from one directory, do the masking and write the result to another directory?

Author

Commented:
exactly! Norie

Author

Commented:
I changed the code import re
import os

#'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890'
#'jklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi4567890123'
 
maskingDef = {'a':'j',
           'b':'k',
           'c':'l',
           'd':'m',
           'e':'n',
           'f':'o',
           'g':'p',
           'h':'q',
           'i':'r',
           'j':'s',
           'k':'t',
           'l':'u',
           'm':'v',
           'n':'w',
           'o':'x',
           'p':'y',
           'q':'z',
           'r':'A',
           's':'B',
           't':'C',
           'u':'D',
           'v':'E',
           'w':'F',
           'x':'G',
           'y':'H',
           'z':'I',
           'A':'J',
           'B':'K',
           'C':'L',
           'D':'M',
           'E':'N',
           'F':'O',
           'G':'P',
           'H':'Q',
           'I':'R',
           'J':'S',
           'K':'T',
           'L':'U',
           'M':'V',
           'N':'W',
           'O':'X',
           'P':'Y',
           'Q':'Z',
           'R':'a',
           'S':'b',
           'T':'c',
           'U':'d',
           'V':'e',
           'W':'f',
           'X':'g',
           'Y':'h',
           'Z':'i',
           '1':'4',
           '2':'5',
           '3':'6',
           '4':'7',
           '5':'8',
           '6':'9',
           '7':'0',
           '8':'1',
           '9':'2',
           '0':'3'}

toMask = ""

def findnth(string, substring, n):
    parts = string.split(substring, n + 1)
    if len(parts) <= n + 1:
        return -1
    return len(string) - len(parts[-1]) - len(substring)

def maskString(toMask):
    maskedString = ""
    for char in toMask:
        if char in maskingDef:
            maskedString += maskingDef[char]
        else :
            maskedString += char
    return maskedString

def findString(replaceLine,separator,startOccurence,endOccurence):
    startIndex = findnth(replaceLine, separator, startOccurence)
    endIndex = findnth(replaceLine, separator, endOccurence)  
   
    if startIndex >= 0 :
        if endIndex >=0 :
            return replaceLine[startIndex+1:endIndex]
        else :
            return replaceLine[startIndex+1:len(replaceLine)]
    else :
        return ""

def findAndMask(replaceLine,separator,startOccurence,endOccurence):
    toMask = findString(replaceLine,separator,startOccurence,endOccurence)
    if toMask != "" :
        return replaceLine.replace(toMask, maskString(toMask))
    else :
        return replaceLine
       
directory = "c:\\temp\\A0A"
for entry in os.scandir(directory):
        if entry.is_file():
            print (entry.name)
        # A0A
            with open(entry.path) as f:
            # Store all lines as list
                data = f.readlines()
                masked = []

            for idx,line in enumerate(data):
                if line.startswith('A0A'):
                    print(idx)
                    pos = data[idx].find(' ',0)

                    masked = [maskingDef[chr] for chr in  data[1][7:pos]]

                    data[idx] = data[idx][0:7]+''.join(masked)+data[1][pos:]

                    print(data)
                else :
                    masked.append(data)




        with open(directory+"\\masked\\"+entry.name, 'w') as file:
   
            print (entry.name)
            file.writelines(data)

But its putting 2 files one withe masked.txt and other with original.txt under the masked folder.Please let me know what change should be done
aikimarkSocial distance; Wear a mask; Don't touch your face; Wash your hands for 20 seconds
CERTIFIED EXPERT
Top Expert 2014

Commented:
You can shorten your code if you leverage the string library to populate the maskingDef variable.
import string

maskingDef = dict(zip(string.ascii_letters, string.ascii_letters[9:] + string.ascii_letters[:9]))
maskingDef.update(dict(zip(string.digits, string.digits[3:] + string.digits[:3])))

Open in new window

aikimarkSocial distance; Wear a mask; Don't touch your face; Wash your hands for 20 seconds
CERTIFIED EXPERT
Top Expert 2014

Commented:
Is this correct?
data[idx] = data[idx][0:7]+''.join(masked)+data[1][pos:]

Open in new window


I would expect this.  I've changed the "1" to "idx"
data[idx] = data[idx][0:7]+''.join(masked)+data[idx][pos:]

Open in new window

aikimarkSocial distance; Wear a mask; Don't touch your face; Wash your hands for 20 seconds
CERTIFIED EXPERT
Top Expert 2014

Commented:
Same would be true for this line.  Why "1" and not "idx"?
masked = [maskingDef[chr] for chr in  data[1][7:pos]]

Open in new window

Social distance; Wear a mask; Don't touch your face; Wash your hands for 20 seconds
CERTIFIED EXPERT
Top Expert 2014
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
CERTIFIED EXPERT

Commented:
I am not sure what exactly you need. However, there is the built-in method of a string type for translating letters in a string using a translation table, and there also is method for building the translation table -- see

https://docs.python.org/3/library/stdtypes.html?highlight=translate#str.translate
https://docs.python.org/3/library/stdtypes.html?highlight=translate#str.maketrans

To find the space in the string after some position, you can use the .find() method -- see

https://docs.python.org/3/library/stdtypes.html?highlight=translate#bytes.find

Then the simplified code may look like this:

table = str.maketrans('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890',
                      'jklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghi4567890123' )

fname_in = 'data.txt'
fname_out = 'data.out'

with open(fname_in) as fin,\
     open(fname_out, 'w') as fout:
    for line in fin:
        if line.startswith('A0A'):
            pos1 = 7    # zero-based indexing
            pos2 = line.find(' ', pos1)   # the space is one after the last mapped
            s = line[:pos1] + line[pos1:pos2].translate(table) + line[pos2:]
            fout.write(s)

Open in new window


It writes only the lines with the A0A. The fname_xx variables can contain full paths (to different directories.

Update: Aikimark was faster :)

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.