Compare two files using python script

Hi All,
I want to compare two files which will have following Ids(fileds).

----------file01--------
6100100013
6110010003
6120010001
6120010002
-------------------------

----------file02---------
6120120001
6130040001
6130070001
6130070005
-------------------------


Two outputs requires
01.) file01 Ids, which are not in file02.
01.) file02 Ids, which are not in file01.

BR Dushan
LVL 17
Dushan De SilvaTechnology ArchitectAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

woolmilkporcCommented:
Why use python?
If the files are sorted, use 'comm'.
01.) comm -2 -3 file01 file02
02.) comm -1 -3 file01 file02
Two-column format -
comm -3 file01 file02
0
Dushan De SilvaTechnology ArchitectAuthor Commented:
Thanks ! But it gives following error, this files contains more than 150,000 records..
----------------------------------------------------------
Traceback (most recent call last):
  File "compare.py", line 10, in <module>
    if i != fileTwo[x]:
IndexError: list index out of range
----------------------------------------------------------
0
C++ 11 Fundamentals

This course will introduce you to C++ 11 and teach you about syntax fundamentals.

Dushan De SilvaTechnology ArchitectAuthor Commented:
Thanks woolmilkporc ! But I specifically need a solution from python script.
0
ghostdog74Commented:
see here(example 3) for a small example
0
Dushan De SilvaTechnology ArchitectAuthor Commented:
Thanks ghostdog74! But it's something different which I want and it will also diffidently says "index out of range", because I have more than 150,000 records in my files.

BR Dushan
0
HonorGodSoftware EngineerCommented:
Is the data ordered, or are we talking about 150,000 random records?
0
Dushan De SilvaTechnology ArchitectAuthor Commented:
Its ordered :)
0
Kamaraj SubramanianApplication Support AnalystCommented:
This script will compare two files and write the difference in the third file.
f1 = open("file1.txt", "r")
f2 = open("file2.txt", "r")
 
fileOne = f1.readlines()
fileTwo = f2.readlines()
f1.close()
f2.close()
outFile = open("results.txt", "w")
x = 0
for i in fileOne:
   if i != fileTwo[x]:
      outFile.write(i+" <> "+fileTwo[x])
   x <strong class="highlight">+</strong>= 1
 
outFile.close()

Open in new window

0
Kamaraj SubramanianApplication Support AnalystCommented:
0
Dushan De SilvaTechnology ArchitectAuthor Commented:
Thanks itkamaraj!
But it gives following error.
--------------------------------------------------------------------------------------------------------------
  File "compare.py", line 32
    x <strong class="highlight">+</strong>= 1
                  ^
SyntaxError: invalid syntax
--------------------------------------------------------------------------------------------------------------

BR Dushan
0
Kamaraj SubramanianApplication Support AnalystCommented:
check this
f1 = open("file1.txt", "r")
f2 = open("file2.txt", "r")
fileOne = f1.readlines()
fileTwo = f2.readlines()
f1.close()
f2.close()
outFile = open("results.txt", "w")
x = 0
for i in fileOne:
  if i != fileTwo[x]:
     outFile.write(i+" <> "+fileTwo[x])
  x += 1
outFile.close()

Open in new window

0
Dushan De SilvaTechnology ArchitectAuthor Commented:
Thanks itkamaraj! It's working .
But it shows results as
-----------------
630260002
 <> 630260001
630260004
 <> 630260002
630260005
 <> 630260004
630260006
 <> 630260005
630260007
-------------------

example :  630260002 is available on both files. Both files are sorted. But because of missing values on some file.... values are not exactly in the same line number..

Please provide just to filter values which are not in file01 but available on file02
and
values which are not in file02 but available on file01
 
BR Dushan

BR Dushan
0
Roger BaklundCommented:
How does this work:
# filecompare.py
in1 = file('file01.txt')
in2 = file('file02.txt')
out1 = file('in01notin02.txt','w')
out2 = file('in02notin01.txt','w')
f1_line = in1.readline().strip()
f2_line = in2.readline().strip()
while f1_line or f2_line:
  if f1_line==f2_line: 
    f1_line = in1.readline().strip()
    f2_line = in2.readline().strip()
  while f1_line and f1_line < f2_line:
    print 'in01notin02',f1_line
    out1.write(f1_line+"\n")
    f1_line = in1.readline().strip()
  while f2_line and f1_line > f2_line:
    print 'in02notin01',f2_line
    out2.write(f2_line+"\n")
    f2_line = in2.readline().strip()
  while f1_line and not f2_line:
    print 'in01notin02',f1_line
    out1.write(f1_line+"\n")
    f1_line = in1.readline().strip()
  while f2_line and not f1_line:
    print 'in02notin01',f2_line
    out2.write(f2_line+"\n")
    f2_line = in2.readline().strip()
in1.close()
in2.close()
out1.close()
out2.close()

Open in new window

0
Roger BaklundCommented:
Sorry, you should remove the print statements, that was just for debugging:
# filecompare.py
in1 = file('file01.txt')
in2 = file('file02.txt')
out1 = file('in01notin02.txt','w')
out2 = file('in02notin01.txt','w')
f1_line = in1.readline().strip()
f2_line = in2.readline().strip()
while f1_line or f2_line:
  if f1_line==f2_line: 
    f1_line = in1.readline().strip()
    f2_line = in2.readline().strip()
  while f1_line and f1_line < f2_line:
    out1.write(f1_line+"\n")
    f1_line = in1.readline().strip()
  while f2_line and f1_line > f2_line:
    out2.write(f2_line+"\n")
    f2_line = in2.readline().strip()
  while f1_line and not f2_line:
    out1.write(f1_line+"\n")
    f1_line = in1.readline().strip()
  while f2_line and not f1_line:
    out2.write(f2_line+"\n")
    f2_line = in2.readline().strip()
in1.close()
in2.close()
out1.close()
out2.close()

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Dushan De SilvaTechnology ArchitectAuthor Commented:
Thanks! but this is giving larger than and less than vales .. I just only want fields which are in file01 , which are not in file02.
and
fields which are in file02 , which are not in file01

BR Dushan
0
Roger BaklundCommented:
That is what my script is doing... did you try to run it?

It reads both files in parallell, and while the value from one file is smaller than the value from the other file, it does not exist in the other file (because the files are sorted), so it is written to the output file. Then the next row from the input is checked. And so on. Just test it. :)
0
Dushan De SilvaTechnology ArchitectAuthor Commented:
Thanks! these tow files are sorted . but this script is giving greater than values on files, which I don't want.. I want only missing ids.

BR Dushan
0
Dushan De SilvaTechnology ArchitectAuthor Commented:
Yes I've executed this script and I got output ids which are already in both files..
0
Roger BaklundCommented:
>> I want only missing ids

That is what the script is supposed to do.

>> Yes I've executed this script and I got output ids which are already in both files..

That did not happen with my test files. Could you provide some example files? Preferably not with 150.000 rows... but two smaller files that fails. These are my test files and my results:
# file01.txt
6100100013
6110010003
6120010001
6120010002
6130040001
6130070005
 
# file02.txt
6120010001
6120120001
6130040001
6130070001
6130070005
 
# output files:
 
# in01notin02.txt
6100100013
6110010003
6120010002
 
# in02notin01.txt
6120120001
6130070001

Open in new window

0
Dushan De SilvaTechnology ArchitectAuthor Commented:
Hi All,
Thanks lot for your kind help!
I found following solution with powerful "Sets" class.

http://www.daniweb.com/code/snippet708.html

BR Dushan
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Shell Scripting

From novice to tech pro — start learning today.