Dushan Silva
asked on
Read two csv files to two dictionaries and compare
Hi Experts,
I have following three files.
-------------------------- ---------- file01.txt ---------- ---------- -------
00650260048_c,,,The Pink Sheet
05040390072,,,The Tan Sheet
020108d4,,,Health News Daily
02260160016_b,,,The Rose Sheet
00630360023,relatedDocs,00 630220023, The Pink Sheet
-------------------------- ---------- ---------- ---------- ---------- --------
-------------------------- ---------- file02.txt ---------- ---------- --------
000105d2,,,Health News Daily
00650260048_c,,,The Pink Sheet
000105d5,,,Health News Daily
05040390072,,,The Tan Sheet
000106d1,,,Health News Daily
000106d3,,,Health News Daily
000106d4,,,Health News Daily
000106d6,,,Health News Daily
-------------------------- ---------- ---------- ---------- ---------- --------
-------------------------b oth.txt--- ---------- ---------- ---------- -------
00650260048_c
05040390072
020108d4
02260160016_b
00630360023
000105d2
00650260048_c
000105d5
05040390072
000106d1
000106d3
000106d4
000106d6
-------------------------- ---------- ---------- ---------- ---------- --------
1.) Load file01.txt into a dictionary, with the key being id (the first value before the comma) and the value being the entire line of the file.
2.) Load file02.txt into a dictionary, with the key being the id (the first value before the comma) and the value being the entire line of the file.
3.) For all of the ids in both.txt, determine which ids(keys) have different values between file01.txt and file02.txt.
Could you kindly help me to write this python script? Final script might enough. I will go though it and will prompt questions if I will get. :)
BR Dushan.
I have following three files.
--------------------------
00650260048_c,,,The Pink Sheet
05040390072,,,The Tan Sheet
020108d4,,,Health News Daily
02260160016_b,,,The Rose Sheet
00630360023,relatedDocs,00
--------------------------
--------------------------
000105d2,,,Health News Daily
00650260048_c,,,The Pink Sheet
000105d5,,,Health News Daily
05040390072,,,The Tan Sheet
000106d1,,,Health News Daily
000106d3,,,Health News Daily
000106d4,,,Health News Daily
000106d6,,,Health News Daily
--------------------------
-------------------------b
00650260048_c
05040390072
020108d4
02260160016_b
00630360023
000105d2
00650260048_c
000105d5
05040390072
000106d1
000106d3
000106d4
000106d6
--------------------------
1.) Load file01.txt into a dictionary, with the key being id (the first value before the comma) and the value being the entire line of the file.
2.) Load file02.txt into a dictionary, with the key being the id (the first value before the comma) and the value being the entire line of the file.
3.) For all of the ids in both.txt, determine which ids(keys) have different values between file01.txt and file02.txt.
Could you kindly help me to write this python script? Final script might enough. I will go though it and will prompt questions if I will get. :)
BR Dushan.
ASKER
I tired following two codes, but I couldn't get values to a dictionary.
1.) code1.py with two dictionaries
2.) code2.pywith Sets
1.) code1.py with two dictionaries
2.) code2.pywith Sets
#------------code1.py-----------------------------
#usage : python code1.py file01.txt file02.txt
import sys
h={}
for line in open(sys.argv[1]):
line=line.strip().split()
print line
h[line[0]]=line[1]
for line in open(sys.argv[2]):
line=line.strip()
l=line.split()
print line,h[l[0]]
----------------------------------------------------
#------------code2.py-------------------------------
#usage : python code1.py file01.txt file02.txt
#! /usr/bin/env python
import sys
import sets
from sets import Set
#Open the list1 and read it into the set1
f=open(sys.argv[1], 'r')
set1 = Set(f.readlines())
f.close()
dic1 = dict([(k, v) for v, k in enumerate(set1)])
print dic1
#Open the list2 and read it into the set2
f=open(sys.argv[2], 'r')
set2 = Set(f.readlines())
f.close()
dic2 = dict([(l, w) for w, l in enumerate(set2)])
print dic2
#Find Delta
diff1 = set1 - set2
diff2 = set2 - set1
#set1-=set2
#Dump delta
f=open(sys.argv[1] + '_NOTIN_'+ sys.argv[2] + '_.txt', 'w')
f.writelines(diff1)
f.close()
f=open(sys.argv[2] + '_NOTIN_'+ sys.argv[1] + '_.txt', 'w')
f.writelines(diff2)
f.close()
----------------------------------------------------
ASKER
and I'm not sure how to get these csv values to dictionary using following script with csv module.
import csv
filename = "file"
reader = csv.reader(open(filename),delimiter=',')
writer = csv.writer( open("newfile.csv","wb") )
for row in reader:
print row #stdout
writer.writerow(row) #write to newfile.csv
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks! I already tried it changing code1.py
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
import csv
import sys
f1 = open(sys.argv[1], 'rt')
f2 = open(sys.argv[2], 'rt')
f3 = open(sys.argv[3], 'rt')
#f1 = csv.reader(open(sys.argv[1]),delimiter=',')
h={}
try:
reader1 = csv.reader(f1, delimiter=',')
reader2 = csv.reader(f2, delimiter=',')
reader3 = csv.reader(f3, delimiter=',')
for row1 in reader1:
# print row1[0]
# print ','.join(row1[1:])
# h[row1[0]]=','.join(row1[1:])
# print h
print row1[1:]
# if row1[0]!= "":
for row2 in reader2:
# if row2[0]!= "":
if row2[0] in row1[0]:
# if row2[1:] in row1[1:]:
print row2[0]
# print row2[1:]
finally:
f1.close()
f2.close()
f3.close()
ASKER
Thanks for your help! I cameup with following solution. :)
https://www.experts-exchange.com/questions/24545046/two-csv-files-to-two-dictionaries-and-compare.html
BR Dushan
https://www.experts-exchange.com/questions/24545046/two-csv-files-to-two-dictionaries-and-compare.html
BR Dushan
Example 4 here is small snippet to get you started, the rest, try to have a go on your own