what is the python code to remove characters?

I have this in a text file

17336665552013070139      {(17336665552013070139,75),(17336665552013070139,35),(17336665552013070139,57)}
17336665592013070149      {(17336665592013070149,75),(17336665592013070149,57),(17336665592013070149,78)}
17336665792013070199      {(17336665792013070199,41)}
17349274502013070413      {(17349274502013070413,25),(17349274502013070413,54)}

I want to remove the first column and the repeated value of the first column, the parenthesis { and } and brackets ( and ).

Need it to look like this:
75,35,57
75,57,78
41
25,54

What is the Python code to do this and save as .csv file?

Thanks
Ricky NgAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

peprCommented:
Try the following code for Python 2. (For Python 3, it must be slightly modified.)
#!python2

import csv

fname = 'data.txt'
fcsvname = 'data.csv'
with open(fname) as fin, open(fcsvname, 'wb') as fout:
    writer = csv.writer(fout)
    for line in fin:
        print '----------------------------------'
        print line,
        # Extract the second part of the line, replace the {} by [] and 
        # convert it to the list. It uses eval() that can be dangerous
        # if someone put a command inside the string. Do that only if you
        # are sure the lines have the structure that you think.
        line = line.split()[1]   # split by whitespace, get only the second part
        print line
        line = '[' + line.rstrip()[1:-1] + ']'  # convert to a list representation
        print line
        lst = eval(line)    # do this dangerous command only when you know your data
        print lst
        
        # The row will be formed only from second parts of the tuples in the list.
        row = [t[1] for t in lst]
        
        # Write the row to the CSV output file.
        writer.writerow(row)
        
        ## remove the debug prints

Open in new window

Modify the name of the input file and of the output file. Ask for details. If eval() should not be used in your case, another approach for parsing can be used.
0
peprCommented:
It is not possible to remove a character from the existing string as Python strings are immutable. However, you can create a new string with removed characters. The above example uses spliting for removing the first number, and slicing for removing the { }. If s is a string, then s[x:y] is a substring from the zero-based index x to the index y excluding. The negative index counts from backward. Then s[1:-1] means a substring from second character to one to befor the last -- thus removing the { }.
0
Ricky NgAuthor Commented:
Hi pepr,

I am getting an error:

  File "dropchars.py", line 7
    with open(fname) as fin, open(fcsvname, 'wb') as fout:
                           ^
SyntaxError: invalid syntax



Thanks
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

peprCommented:
Did you set the fname earlier?
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
aikimarkCommented:
You can also apply two regular expression patterns to do the parsing.
First apply the following pattern to all the text:
\{(.*)\}
Then iterate the matches and apply the following pattern to each match
,(\d\d)\)
0
peprCommented:
My +1 for aikimarks suggestion to use regular expression in this case. It will actually be safer (avoiding eval()), and I would not be surprised if the solution was also faster. I would only use a different regular expression that captures a single tuple (in parentheses) and then use the findall method of the regular expression to return the list of wanted elements. However, the result is the list of strings that should be converted to integer befor using the csv module:
#!python2

import csv
import re

fname = 'data.txt'
fcsvname = 'data.csv'
rexSecondItems = re.compile(r'\(\d+,(\d+)\)')

with open(fname) as fin, open(fcsvname, 'wb') as fout:
    writer = csv.writer(fout)
    for line in fin:
        lstS = rexSecondItems.findall(line)
        row = [int(s) for s in lstS]
        writer.writerow(row)

Open in new window

The r'...' means a raw-string. That means that the escape sequences (that start with backslash) will not be interpreted. This is usual when working with regular expressions, because regular expression use backslashes and they want to interpret them on their own. The \( means "one character equal to left parenthesis". It is written with backslash because parentheses without backslash group the part of the regular expression -- as the later part of the regular expression shows. The \d means decimal numeral, the + means one or more times. The .findall returns the list of all grouped matches.

If you know, that the data contain only numbers, you can even avoid using the csv module and join the list on your own:
#!python2

import re

fname = 'data.txt'
fcsvname = 'data.csv'
rexSecondItems = re.compile(r'\(\d+,(\d+)\)')

with open(fname) as fin, open(fcsvname, 'w') as fout:
    for line in fin:
        lst = rexSecondItems.findall(line)
        fout.write(','.join(lst) + '\n')

Open in new window

In this case, the file should be open in the text mode (unlike in the previous case where csv module requires binary mode).
0
aikimarkCommented:
If you read the file line-by-line you can skip the first pattern.  The second pattern will parse out the numbers.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Python

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.