Solved

Read two csv files to two dictionaries and compare

Posted on 2009-07-04
8
742 Views
Last Modified: 2012-08-13
Hi Experts,
I have following three files.  
------------------------------------file01.txt---------------------------
00650260048_c,,,The Pink Sheet
05040390072,,,The Tan Sheet
020108d4,,,Health News Daily
02260160016_b,,,The Rose Sheet
00630360023,relatedDocs,00630220023,The Pink Sheet
--------------------------------------------------------------------------

------------------------------------file02.txt----------------------------
000105d2,,,Health News Daily
00650260048_c,,,The Pink Sheet
000105d5,,,Health News Daily
05040390072,,,The Tan Sheet
000106d1,,,Health News Daily
000106d3,,,Health News Daily
000106d4,,,Health News Daily
000106d6,,,Health News Daily
--------------------------------------------------------------------------

-------------------------both.txt----------------------------------------
00650260048_c
05040390072
020108d4
02260160016_b
00630360023
000105d2
00650260048_c
000105d5
05040390072
000106d1
000106d3
000106d4
000106d6
--------------------------------------------------------------------------
1.) Load file01.txt into a dictionary, with the key being id (the first value before the comma) and the value being the entire line of the file.  
2.) Load file02.txt into a dictionary, with the key being the id (the first value before the comma) and the value being the entire line of the file.
3.) For all of the ids in both.txt, determine which ids(keys) have different values between file01.txt and file02.txt.

Could you kindly help me to write this python script? Final script might enough. I will go though it and will prompt questions if I will get.  :)
BR Dushan.
0
Comment
Question by:Dushan De Silva
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
8 Comments
 
LVL 9

Expert Comment

by:ghostdog74
ID: 24778923
since you know what you want to do , why don't you start writing your code? ie, you should put in some effort. read the docs on how to use dictionaries and other stuffs. Write code and if you are stuck, then come here to ask.
Example 4 here is small snippet to get you started, the rest, try to have a go on your own
0
 
LVL 17

Author Comment

by:Dushan De Silva
ID: 24778953
I tired following two codes, but I couldn't get values to a dictionary.
1.) code1.py with two dictionaries
2.) code2.pywith Sets
#------------code1.py-----------------------------
#usage : python code1.py file01.txt file02.txt
import sys
h={}
for line in open(sys.argv[1]):
    line=line.strip().split()
    print line
    h[line[0]]=line[1]
for line in open(sys.argv[2]):
    line=line.strip()
    l=line.split()
    print line,h[l[0]]
----------------------------------------------------
 
#------------code2.py-------------------------------
#usage : python code1.py file01.txt file02.txt
#! /usr/bin/env python 
import sys
import sets
from sets import Set
#Open the list1 and read it into the set1
f=open(sys.argv[1], 'r')
set1 = Set(f.readlines())
f.close() 
 
dic1 = dict([(k, v) for v, k in enumerate(set1)])
print dic1
 
 
#Open the list2 and read it into the set2
f=open(sys.argv[2], 'r')
set2 = Set(f.readlines())
f.close() 
 
dic2 = dict([(l, w) for w, l in enumerate(set2)])
print dic2
 
#Find Delta
diff1 = set1 - set2
diff2 = set2 - set1
 
#set1-=set2 
 
#Dump delta
 
f=open(sys.argv[1] + '_NOTIN_'+ sys.argv[2] + '_.txt', 'w')
f.writelines(diff1)
f.close()
 
f=open(sys.argv[2] + '_NOTIN_'+ sys.argv[1] + '_.txt', 'w')
f.writelines(diff2)
f.close()
----------------------------------------------------

Open in new window

0
 
LVL 17

Author Comment

by:Dushan De Silva
ID: 24778959
and I'm not sure how to get these csv values to dictionary using following script with csv module.
import csv
filename = "file"
reader = csv.reader(open(filename),delimiter=',')
writer = csv.writer( open("newfile.csv","wb") )
for row in reader: 
    print row #stdout
    writer.writerow(row) #write to newfile.csv

Open in new window

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 9

Assisted Solution

by:ghostdog74
ghostdog74 earned 500 total points
ID: 24778963
if you have both files's delimiters as comma, then split on comma , eg  line.split(",")
if you want to use 1st column and the key and the whole line as value

h[line[0]] = ','.join(line)
0
 
LVL 17

Author Comment

by:Dushan De Silva
ID: 24778988
Thanks! I already tried it changing code1.py
0
 
LVL 9

Accepted Solution

by:
ghostdog74 earned 500 total points
ID: 24779166
>> and I'm not sure how to get these csv values to dictionary using following script with >> csv module

i am  not sure why you have so much difficulty.



import csv
filename = "file"
reader = csv.reader(open(filename),delimiter=',')
h={}
for row in reader:
    print row[0]
    print ','.join(row[1:])
    h[row[0]]=','.join(row[1:])  
print h

Open in new window

0
 
LVL 17

Author Comment

by:Dushan De Silva
ID: 24780963

import csv
import sys
 
f1 = open(sys.argv[1], 'rt')
f2 = open(sys.argv[2], 'rt')
f3 = open(sys.argv[3], 'rt')
#f1 = csv.reader(open(sys.argv[1]),delimiter=',')
h={}
try:
    reader1 = csv.reader(f1, delimiter=',')
    reader2 = csv.reader(f2, delimiter=',')
    reader3 = csv.reader(f3, delimiter=',')
    for row1 in reader1:
#       print row1[0]
#       print ','.join(row1[1:])
#       h[row1[0]]=','.join(row1[1:])
#       print h
            print row1[1:]
#       if row1[0]!= "":
            for row2 in reader2:
#               if row2[0]!= "":
                    if row2[0] in row1[0]:
#                       if row2[1:] in row1[1:]:
                           print row2[0]
#           print row2[1:]
 
finally:
    f1.close()
    f2.close()
    f3.close()

Open in new window

0
 
LVL 17

Author Comment

by:Dushan De Silva
ID: 24811240
Thanks for your help! I cameup with following solution. :)
http://www.experts-exchange.com/Programming/Languages/Scripting/Python/Q_24545046.html

BR Dushan
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Sequence is something that used to store data in it in very simple words. Let us just create a list first. To create a list first of all we need to give a name to our list which I have taken as “COURSE” followed by equals sign and finally enclosed …
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Suggested Courses

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question