Go over a ltf file with python

Dear experts,
I am a newbie in the python world. I would like to build a small script that goes over a specific file (UTF-8) and do the following:
For each line that start with XXXX copy the line, remove the XXXX and put the output on a different file.
So eventually I will have a new file with all the lines from the original file that had XXX (without the XXX).
Can you please provide me some example that similar to what I would like to build?
Best regards,
Boaz.
WAS_InfraAsked:
Who is Participating?
 
BxozConnect With a Mentor Commented:
Same code but with XXX removed

# -*- coding: iso-8859-1 -*-
import re
obFile = open('fileToRead.txt','r')
obFileW = open('fileToWrite.txt','w')

lignes = obFile.readlines()
reg1=re.compile('^XXX')
for i in lignes:
    if reg1.findall(i):
        obFileW.write(i.replace('XXX',''))
obFile.close()
obFileW.close()

Open in new window

0
 
BxozCommented:
# -*- coding: iso-8859-1 -*-
import re
obFile = open('fileToRead.txt','r')
obFileW = open('fileToWrite.txt','w')

lignes = obFile.readlines()
reg1=re.compile('^XXX')
for i in lignes:
    if reg1.findall(i):
        obFileW.write(i)
obFile.close()
obFileW.close()

Open in new window

0
 
peprCommented:
The solution with regular expressions is an overkill if you really do not need them.  For the case when the line starts with known prefix, use the .startswith() method of the built-in string.  Also, there is no need to read the lines first to a list.  It is better to process the file on-the-fly (the data.txt file attached):

fin = open('data.txt')
fout = open('out.txt', 'w')

for line in fin:
    if line.startswith('XXXX'):
        fout.write(line)
    
fin.close()
fout.close()

Open in new window

data.txt
0
 
peprCommented:
For the UTF-8, it is a separate story. It depends also on whether you use Python 2.x or Python 3.  If using Python 2.x it depends on whether you want to theat the strings as sequences of bytes or as unicode strings.  Use the codecs module http://docs.python.org/library/codecs.html for the later.  The difference with opening the files seems only a minor one...

import codecs

fin = codecs.open('data.txt', encoding='utf-8')
fout = codecs.open('out.txt', 'w', encoding='utf-8')

for line in fin:
    if line.startswith('XXXX'):
        fout.write(line)
    
fin.close()
fout.close()

Open in new window


However, the line variable now contains unicode strings.  You can even convert the encoding for the output.

I do recommend to read the "Dive into Python 3" by Mark Pilgrim, Chapter 4. Strings -- http://diveintopython3.org/strings.html
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.