format of a file python


Script to convert file1.txt to file2.txt.

I have a file with the format as in file 1.txt and want to write an automated script to always convert that type of file to a file as in type file2.txt.

Please write a smart python script to do that in a efficient way :-)  Any help will be strongly appreciated.

Thanks you very much!

Please note that I also changed the ordered of the output IDs.  The 1aho ID goes to the end of the file and its called sequence instead of structure.  There will always be one ID that will go to the end.  the name of that ID is known, in this case it was 1aho.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

could you please give a little more info about the expected file sizes and the anount of RAM available on your host.

This might influence, whether you read in the enite file and process in RAM or whether you work line by line.

May I assume, that:
- '>' occurs only at the beginning of a line for new entries?
- '*' occurs only at the end of a line for the end of an entry

that a line starting with '>' will always finish with "PDBID|CHAIN|SEQUENC"

You can write the script more robust against line breaks / changes in identifier lengths or faster if you know,
that certain things amount of white space characters / etc. ) will never change.
I don't know whether your goal is robustness or speed.

I'll go for a mixed approach with my example,

Please confirm, that the behviour is the expected one.

f yes, then I could clean up the code a little.

def parse_entry(entry):
    #print "E:<%s>" % entry
    lines = entry.split('\n')
    headline = lines[0]
    id_ = headline[4:8].lower()
    # if identifier length varies I could stop at the occurence of '|'
    head = ''.join((headline[:4],id_,headline[9]))
    data = ''.join(('\n',lines[2],'\n',lines[3].strip(),'*\n\n'))
    return (id_,head,data)

def convert(infile_name,outfile_name,lastid):
    indata = open(infile_name).read().split('*')
    outdata = []
    lastentry = None
    for entry in indata:
        entry_s = entry.strip()
        if entry_s == "":
        parsed = parse_entry(entry_s)
        if parsed[0] == lastid:
            lastentry = (parsed[1],'\nsequence:',parsed[0],
                ':     : :     : ::: 0.00: 0.00',parsed[2])
            print "LAST",parsed[0]
            print "NOT LAST",parsed[0]
                '.pdb:FIRST:@ LAST::::::',parsed[2]))



if __name__ == "__main__":

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
To gelonida: You should always close the file that you opened, especially when you are writing into it.
Or you should .close() it explicitly, or you should use the new "with" structure.
dfernanAuthor Commented:
Thanks you very much!  exactly what I was looking for.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.