two csv files to two dictionaries and compare

Hi Experts,
I have following three files.  
------------------------------------file01.txt---------------------------
00650260048_c,,,The Pink Sheet
05040390072,,,The Tan Sheet
020108d4,,,Health News Daily
02260160016_b,,,The Rose Sheet
00630360023,relatedDocs,00630220023,The Pink Sheet
--------------------------------------------------------------------------

------------------------------------file02.txt----------------------------
000105d2,,,Health News Daily
00650260048_c,,,The Pink Sheet
000105d5,,,Health News Daily
05040390072,,,The Tan Sheet
000106d1,,,Health News Daily
000106d3,,,Health News Daily
000106d4,,,Health News Daily
000106d6,,,Health News Daily
--------------------------------------------------------------------------

-------------------------both.txt----------------------------------------
00650260048_c
05040390072
020108d4
02260160016_b
00630360023
000105d2
00650260048_c
000105d5
05040390072
000106d1
000106d3
000106d4
000106d6
--------------------------------------------------------------------------
1.) Load file01.txt into a dictionary, with the key being id (the first value before the comma) and the value being the entire line of the file.  
2.) Load file02.txt into a dictionary, with the key being the id (the first value before the comma) and the value being the entire line of the file.
3.) For all of the ids in both.txt, determine which ids(keys) have different values between file01.txt and file02.txt.

I want python script.
LVL 17
Dushan De SilvaTechnology ArchitectAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Dushan De SilvaTechnology ArchitectAuthor Commented:
import csv
import sys

f1 = open(sys.argv[1], 'rt')
f2 = open(sys.argv[2], 'rt')
f3 = open(sys.argv[3], 'rt')
#f1 = csv.reader(open(sys.argv[1]),delimiter=',')
h={}
try:
    reader1 = csv.reader(f1, delimiter=',')
    reader2 = csv.reader(f2, delimiter=',')
    reader3 = csv.reader(f3, delimiter=',')
    for row1 in reader1:
#       print row1[0]
#       print ','.join(row1[1:])
#       h[row1[0]]=','.join(row1[1:])
#       print h
            print row1[1:]
#       if row1[0]!= "":
            for row2 in reader2:
#               if row2[0]!= "":
                    if row2[0] in row1[0]:
#                       if row2[1:] in row1[1:]:
                           print row2[0]
#           print row2[1:]

finally:
    f1.close()
    f2.close()
    f3.close()
0
Dushan De SilvaTechnology ArchitectAuthor Commented:
plz help me !
0
HonorGodSoftware EngineerCommented:
Something like this perhaps?
import csv, sys, os.path
 
def csvToDict( filename ):
  result = {};
  if os.path.exists( filename ) :
    try :
      handle = open( filename );
      for line in csv.reader( handle, delimiter=',' ) :
        inx = line[ 0 ];
        data = ','.join( line );
        result[ inx ] = data
      handle.close();
    except :
      ( kind, val ) = sys.exc_info()[ :2 ];
      ( kind, val ) = str( kind ), str( val );
      print 'Error: ' + kind + '\nValue: ' + val;
  return result;
 
def CSVcompare( file1, file2 ):
  print 'file1: "%s"\nfile2: "%s"' % ( file1, file2 )
  d1 = csvToDict( file1 );
  d2 = csvToDict( file2 );
  n1 = d1.keys();
  n1.sort();
  for n in n1[ : ] :
    if d2.has_key( n ) :
      if d1[ n ] == d2[ n ] :
        del( d1[ n ] );
        del( d2[ n ] );
      else :
        print d1[ n ];
    else :
      print d1[ n ];
  print '-' * 50
  n2 = d2.keys();
  n2.sort();
  for n in n2[ : ] :
    if d1.has_key( n ) :
      if d1[ n ] != d2[ n ] :
        print d2[ n ];
    else :
      print d2[ n ];
 
 
def Usage( cmd = 'CSVcompare' ):
  print 'Uage: %s.py file1.csv file2.csv'
  sys.exit()
 
if __name__ == '__main__' or __name__ == 'main' :
  if len( sys.argv ) == 3 :
    CSVcompare( sys.argv[ 1 ], sys.argv[ 2 ] );
  else :
    Usage();
else :
  Usage( __name__ );

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

Dushan De SilvaTechnology ArchitectAuthor Commented:
Thanks lot HonorGod!!
That's really close to which I'm looking. And I need to compare ids in both.txt against file01.txt and file02.txt 's ids. Then if thoese ids avialable on file01.txt and file02.txt  then I should match the rest of the line part in those files. If you could help for solve that also it will be great! :)

BR Dushan
0
Dushan De SilvaTechnology ArchitectAuthor Commented:
Can u plz explain following code part.

       inx = line[ 0 ];
        data = ','.join( line );
        result[ inx ] = data

Open in new window

0
Dushan De SilvaTechnology ArchitectAuthor Commented:
and :)
      ( kind, val ) = sys.exc_info()[ :2 ];
      ( kind, val ) = str( kind ), str( val );
      print 'Error: ' + kind + '\nValue: ' + val;

Open in new window

0
HonorGodSoftware EngineerCommented:
Certainly

inx = line[ 0 ]


assigns the value of the 1st element of the list variable named line to the local variable named inx (for "index", but you don't want to use that as a variable name, since it occurs as a built-in string, and list method).

data = ','.join( line );

Takes the list variable named line, and converts it into a string with a comma as the delimiter between entries, and assigns this string result to the local variable named data.

result[ inx ] = data;

Uses the local variable named inx as an index, or key, into the dictionary variable named result, and assigns this the value currently in the local variable named data.
Since data isn't really used anywhere else, we could combine these two statements into 1:

result[ inx ] = ','.join( line );

This statement is a little more interesting:

( kind, val ) = sys.exc_info()[ :2 ];

To understand this, we need to know that sys.exc_info() is a thread safe way to access the following 3 exception state values.

- sys.exc_type  = Exception type
- sys.exc_value = Exception value
- sys.exc_traceback = Exception stack trace

The [ :2 ] after this function call is called slice notation that is used to specify that we (only) want the values up to, but not including [ 2 ] (i.e., the stack trace).  So this:

sys.exc_info()[ :2

returns a list of two values.

An assignment statement that looks like this:

( kind, val ) = sys.exc_info()[ :2 ];

takes the sequence on the right side of the assignment statement, and using sequence unpacking, assigns the values to the variables in the tuple on the left.
So the first value (i.e., sys.exc_info()[ 0 ] ) will be assigned (bound) to the local variable named "kind", and the second value (i.e., sys.exc_info()[ 1 ] ) will be assigned (bound) to the local variable named "val".

However, the data types of these values aren't strings.  It is much easier to deal with these values if they are strings.  Hence the statement:

( kind, val ) = str( kind ), str( val );

Logically what happens is this:
- the value in "kind" is converted to a string
- the value in "val" is converted to a string
- these two string values are combined (using tuple packing) into a tuple
- Using sequence unpacking the tuple values are bound (assigned) to the values on the left side of the assignment statement, so this single statement is equivalent to:
  kind = str( kind )
  val   = str( val )

The last statement:

print 'Error: ' + kind + '\nValue: ' + val;

Is used to display the exception type and value in a string using string concatenation because two strings may be concatenated using the + operator.

Does this help your understanding?
0
HonorGodSoftware EngineerCommented:
I went back to reread the sequence of updates, and don't quite understand for what you are asking in this update: http:#a24782282

Can you clarify it, and perhaps state it a different way?
0
t0t0Commented:
I can do it using DOS batch files.

The batch file first creates it's own BOTH.TXT based on the contents of FILE01.TXT and FILE02.TXT. The resultant BOTH.TXT differs from yours in that it does not contain duplicate entries.

The first FOR loop get data from FILE01.TXT into BOTH.TXT.

The second FOR loop gets data from FILE02.TXT into BOTH.TXT providing that data does not already exist in BOTH.TXT.

The third FOR loop is the actual loop which processes BOTH.TXT. It compares each entry against FILE01.TXT and FILE02.TXT and if the entry is found in both files, it compares the whole lines for any differences. If there are differences, both lines are printed to the screen.

@echo off
(for /f "tokens=1 delims=," %%a in ('type file01.txt') do (
   echo %%a
))>both.txt

(for /f "tokens=1 delims=," %%a in ('type file02.txt') do (
   findstr /i "%%a" both.txt>nul
   if ERRORLEVEL 1 (
      echo %%a
   )
))>>both.txt

for /f %%a in ('type both.txt') do (
   for /f "tokens=*" %%b in ('findstr /i /b "%%a" file01.txt') do (
      if ERRORLEVEL 0 (
         for /f "tokens=*" %%c in ('findstr /i /b "%%a" file02.txt') do (
            if ERRORLEVEL 0 (
               if not "%%b"=="%%c" (
                  echo File01.txt: %%b
                  echo File02.txt: %%c
               )
            )
         )
      )
   )
)
0
Dushan De SilvaTechnology ArchitectAuthor Commented:
Thannk you very much HonorGod!
and it's really helpful to me about the code explaination, I think I should look more python commands from documentation.
on http://www.experts-exchange.com/Programming/Languages/Scripting/Python/Q_24545046.html#a24782282
I tired to explain below point. :)
3.) For all of the ids in both.txt, determine which ids(keys) have different values between file01.txt and file02.txt.
0
Dushan De SilvaTechnology ArchitectAuthor Commented:
Thanks t0t0! but I'm runing on this code on unix.
0
Dushan De SilvaTechnology ArchitectAuthor Commented:
I tried below code, but it's not giving correct results :) .. sorry for bothering u HonorGod.
import csv, sys, os.path
 
def csvToDict( filename ):
  result = {};
  if os.path.exists( filename ) :
    try :
      handle = open( filename );
      for line in csv.reader( handle, delimiter=',' ) :
        inx = line[ 0 ];
        data = ','.join( line );
        result[ inx ] = data
      handle.close();
    except :
      ( kind, val ) = sys.exc_info()[ :2 ];
      ( kind, val ) = str( kind ), str( val );
      print 'Error: ' + kind + '\nValue: ' + val;
  return result;
 
def CSVcompare( file1, file2, file3 ):
  print 'file1: "%s"\nfile2: "%s"' % ( file1, file2 )
  d1 = csvToDict( file1 );
  d2 = csvToDict( file2 );
  d3 = csvToDict( file3 );
  n1 = d1.keys();
  n1.sort();
  for n in n1[ : ] :
    if d2.has_key( n ) :                        #compare keys
     if d3.has_key( n ) :
      if d2[ n ] == d3[ n ] :                   #compare values
#       print d2[n];
#        print d3[n];
        del( d2[ n ] );
        del( d3[ n ] );
      else :
        print d2[ n ];
     else :
      print d3[ n ];
    else :
      print d2[ n ];
 
#  print '-' * 50
#  n2 = d2.keys();
#  n2.sort();
#  for n in n2[ : ] :
#    if d1.has_key( n ) :
#      if d1[ n ] != d2[ n ] :
#        print d2[ n ];
#    else :
#      print d2[ n ];
 
def Usage( cmd = 'CSVcompare' ):
  print 'Uage: %s.py both.csv file1.csv file2.csv'
  sys.exit()
 
if __name__ == '__main__' or __name__ == 'main' :
  if len( sys.argv ) == 4 :
    CSVcompare( sys.argv[ 1 ], sys.argv[ 2 ], sys.argv[ 3 ]);
  else :
    Usage();
else :
  Usage( __name__ );

Open in new window

0
t0t0Commented:
No probs... it's an interesting question which is why I gqve it a shot.
0
HonorGodSoftware EngineerCommented:
What data are you using for input, and what problems are you seeing?
0
Dushan De SilvaTechnology ArchitectAuthor Commented:
I'm using imput texgt file as input. Problem is first I should check ids in both.txt, against file01.txt and file02.txt, if that id avialable on file01.txt and file02.txt , then should compare that perticular id's perticular field's rest of the line part (that is in file01.txt and file02.txt)

Example : steps
1.) we should read values in both.txt
2.) the first value is 00650260048_c
3.) Then should check that perticular value is available on file01.txt and file02.txt ' on first field. (yes on file01.txt it's 1st line and on file02.txt it's 2nd value), if it not available we should print that ids
4.) If it avilable on file01.txt and file02.txt, then we could skip it, if it not avialable , then we should print it.
5.) If that id avialable on both file01.txt and file02.txt , then we should match that perticular line's other part, if it matching we should skip it, if it not matching we should print it.
(yes it's matching and should skip  "  ,,,The Pink Sheet "    on file01.txt 1st line and file02.txt 2nd line)
0
t0t0Commented:
FOR DOS USERS

Apologies for coming back to you on this one. Quite frankly, I'm surprised you haven't received a solution to this problem.

In DOS, this would be done as follows (modified to output just ID if lines match):


@echo off
(for /f "tokens=1 delims=," %%a in ('type file01.txt') do (
   echo %%a
))>both.txt

(for /f "tokens=1 delims=," %%a in ('type file02.txt') do (
   findstr /i "%%a" both.txt>nul
   if ERRORLEVEL 1 (
      echo %%a
   )
))>>both.txt

for /f %%a in ('type both.txt') do (
   for /f "tokens=*" %%b in ('findstr /i /b "%%a" file01.txt') do (
      if ERRORLEVEL 0 (
         for /f "tokens=*" %%c in ('findstr /i /b "%%a" file02.txt') do (
            if ERRORLEVEL 0 (
               if not "%%b"=="%%c" (
                  echo File01.txt: %%b
                  echo File02.txt: %%c
               ) else (
                  echo %%a
               )
            )
         )
      )
   )
)
0
HonorGodSoftware EngineerCommented:
ok, I went back, and re-read the question.

Please forgive me for my misunderstanding.
The code, as written will:

- read file1 into a dictionary (D1), indexed by the first csv value.  so far, so good.
- read file2 into a dictionary (D2), indexed by the first csv value.  so far, so good.
- "compare" the two dicationaries.
  For each key in dictionary #1 (D1):
  - if the key exists in both dictionaries, and the values agree, we're fine, ignore the value (i.e., by removing them)
  - if the key exists in both dictionaries, but the values differ, display the value from D1 (mismatch)
  - if the key doesn't exist in D2, display the value because it's unique to D1.
  For each key in dictionary #2 (D2):
  - if the key exists in both dictionaries, but the values differ, display the value from D2 (mismatch)
  - if the key doesn't exist in D1, display the value because it's unique to D2.
 
So, what is the problem?
- is it the fact that the comparison will display the mismatch entries from both dictionaries?
- or, it it the fact that values unique to each dictionary will be displayed?

My presumption is that you don't want the second.  In which case,  the comparison script should be written something like this:

- Sample output:
----------------------------------------------------------------------
C:\Programs\Python\csv>type file01.txt
00650260048_c,,,The Pink Sheet
05040390072,,,The Tan Sheet
020108d4,,,Health News Daily
02260160016_b,,,The Rose Sheet
00630360023,relatedDocs,00630220023,The Pink Sheet
----------------------------------------------------------------------
C:\Programs\Python\csv>type file02.txt
000105d2,,,Health News Daily
00650260048_c,,,The Pink Sheet
000105d5,,,Health News Daily
05040390072,,,The Tan Sheet
000106d1,,,Health News Daily
000106d3,,,Health News Daily
000106d4,,,Health News Daily
000106d6,,,Health News Daily
----------------------------------------------------------------------
C:\Programs\Python\csv>CSVcompare.py file01.txt file02.txt
----------------------------------------------------------------------

C:\Programs\Python\csv>type file01.txt
00650260048_c,,,The Pink Sheet
05040390072,,,The Tan Sheet
020108d4,,,Health News Daily
02260160016_b,,,The Rose Sheet
00630360023,relatedDocs,00630220023,The Pink Sheet
C:\Programs\Python\csv>type file03.txt
00650260048_c,,,The Pink Sheet
05040390072,,,The Tan Sheet
020108d4,,,<Different value>
02260160016_b,,,The Rose Sheet
00630360023,relatedDocs,00630220023,The Pink Sheet
C:\Programs\Python\csv>CSVcompare.py file01.txt file03.txt
020108d4,,,Health News Daily
020108d4,,,<Different value>

C:\Programs\Python\csv>


Is that what you wanted?
import csv, sys, os.path
 
def csvToDict( filename ):
  result = {};
# print 'filename: ' + filename;
  if os.path.exists( filename ) :
    try :
      handle = open( filename );
      for line in csv.reader( handle, delimiter=',' ) :
        inx = line[ 0 ];
        data = ','.join( line );
#       print '["%s"] = "%s"' % ( inx, data );
        result[ inx ] = data
      handle.close();
    except :
      ( kind, val ) = sys.exc_info()[ :2 ];
      ( kind, val ) = str( kind ), str( val );
      print 'Error: ' + kind + '\nValue: ' + val;
  return result;
 
def CSVcompare( file1, file2 ):
# print 'file1: "%s"\nfile2: "%s"' % ( file1, file2 )
  d1 = csvToDict( file1 );
  d2 = csvToDict( file2 );
  n1 = d1.keys();
  n1.sort();
  for n in n1[ : ] :
    if d2.has_key( n ) :
      if d1[ n ] == d2[ n ] :
        del( d1[ n ] );
        del( d2[ n ] );
      else :
        print d1[ n ];
    else :
#      print d1[ n ];
       pass;
# print '-' * 50
  n2 = d2.keys();
  n2.sort();
  for n in n2[ : ] :
    if d1.has_key( n ) :
      if d1[ n ] != d2[ n ] :
        print d2[ n ];
    else :
#     print d2[ n ];
      pass;
 
 
def Usage( cmd = 'CSVcompare' ):
  print 'Uage: %s.py file1.csv file2.csv'
  sys.exit()
 
if __name__ == '__main__' or __name__ == 'main' :
  if len( sys.argv ) == 3 :
    CSVcompare( sys.argv[ 1 ], sys.argv[ 2 ] );
  else :
    Usage();
else :
  Usage( __name__ );

Open in new window

0
t0t0Commented:
>> "is it the fact that the comparison will display the mismatch entries from both dictionaries?"
>> "or, it it the fact that values unique to each dictionary will be displayed?"

BOTH! - Neither of these are part of the asker's requirements.
0
Dushan De SilvaTechnology ArchitectAuthor Commented:
Hi HonorGod and t0t0,

Thanks lot for your really great effort!
Finally I cameup following code with looking at your  code :)

Thanks so much!!!!!!!!!
import csv, sys, os.path
 
def csvToDict( filename ):
  result = {};                 
  if os.path.exists( filename ) :
    try :
      handle = open( filename );
      for line in csv.reader( handle, delimiter=',' ) :
 
        inx = line[ 0 ];
        data = ','.join( line );
        result[ inx ] = data
      handle.close();
    except :
      ( kind, val ) = sys.exc_info()[ :2 ];
      ( kind, val ) = str( kind ), str( val );
      print 'Error: ' + kind + '\nValue: ' + val;
  return result;
 
def CSVcompare( file1, file2, file3 ):
  print 'file1: "%s"\nfile2: "%s"' % ( file1, file2 )
  d1 = csvToDict( file1 );
  d2 = csvToDict( file2 );
  d3 = csvToDict( file3 );
  n1 = d1.keys();
  n1.sort();
  for n in n1:
    if n in d1 and n in d2:                                         
      if d2[n] != d3[n]:                                            
        print "values differ, ", n, "---", d2[n], "---", d3[n]
 
  print '-' * 50
 
 
def Usage( cmd = 'CSVcompare' ):
  print 'Uage: %s.py both.csv file1.csv file2.csv'
  sys.exit()
 
if __name__ == '__main__' or __name__ == 'main' :
  if len( sys.argv ) == 4 :
    CSVcompare( sys.argv[ 1 ], sys.argv[ 2 ], sys.argv[ 3 ]);
  else :
    Usage();
else :
  Usage( __name__ );

Open in new window

0
HonorGodSoftware EngineerCommented:
Thanks for the grade & points.

Good luck & have a great day.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Programming

From novice to tech pro — start learning today.