mtnseeker
asked on
compare csv on exact columns
this is a fairly common question I know from searching; however, none of what I found really seemed to be quite what I wanted. I've looked at trying powershell, but have never messed with it; looking for the best and quickest solution. This is something I'd like to automate.
csv1: need to look at column 3
csv2: need to look at column 2
columns 3 and 2 should match; if there are differences, such as an extra I need it outputted to a file
csv1: need to look at column 3
csv2: need to look at column 2
columns 3 and 2 should match; if there are differences, such as an extra I need it outputted to a file
ASKER
csv1:
data, data, AccountNumber, data, data
csv2:
data, AccountNumber, data, data, data, data
output:
Just the AccoutNumber that does not exist in one of the two CSV's. They should both be in there, but there may be an account number that is found in one and not the other.
data, data, AccountNumber, data, data
csv2:
data, AccountNumber, data, data, data, data
output:
Just the AccoutNumber that does not exist in one of the two CSV's. They should both be in there, but there may be an account number that is found in one and not the other.
Check this:
$csv1 = Import-Csv C:\ee\CSV1.csv -UseCulture
$csv2 = Import-Csv C:\ee\CSV2.csv -UseCulture
Compare-Object $csv1 $csv2 -Property Accountnumber | Select-Object -Property accountnumber
ASKER
@aikimark: yes... unfortunately I didn't tag enough zones. My question was meant to be more general than specific to powershell (I was guessing it was the best solution). I am also looking at any utilities (freeware) that may do the job as well.
my approach would be to run two queries against an outer-joined view of the two tables. Let the database engine do the heavy lifting.
* left join where the right table value is null
* right join where the left table value is null
====
If the column data is unique, you could do the following:
In a VBScript environment, I would populate two "scripting.dictionary" objects with the contents of the two columns. Iterate through each of the objects once, invoking the Exists() method on the other dictionary object.
If the column data isn't unique, you could invoke that dictionary object's Exists() method before adding the column data.
If this is a simple CSV file, your parsing of the data should be straight forward.
* left join where the right table value is null
* right join where the left table value is null
====
If the column data is unique, you could do the following:
In a VBScript environment, I would populate two "scripting.dictionary" objects with the contents of the two columns. Iterate through each of the objects once, invoking the Exists() method on the other dictionary object.
If the column data isn't unique, you could invoke that dictionary object's Exists() method before adding the column data.
If this is a simple CSV file, your parsing of the data should be straight forward.
Will the values in the columns to compare be in the same order, or do we essentially have to search the whole file looking for a row from the first file?
~bp
~bp
@bp
That's the advantage of using a hashed lookup (dictionary)
That's the advantage of using a hashed lookup (dictionary)
==> aikimark
Yes, after i posted my question I saw your post and we are thinking along similar lines...
~bp
Yes, after i posted my question I saw your post and we are thinking along similar lines...
~bp
Before you guys invent the wheel, please check my PowerShell solution, please.
>>check my PowerShell solution
@mtnseeker
* Have you tried the posted PS code?
* Do the CSV files have a header row?
@mtnseeker
* Have you tried the posted PS code?
* Do the CSV files have a header row?
ASKER
the csv files do not have a header row so that code didn't run...
also with this powershell solution how would I output the results to a third csv or other text file?
also with this powershell solution how would I output the results to a third csv or other text file?
@mtnseeker
please post some sample rows from your two CSV files. You should obscure the actual data if it is sensitive. However, it should be an accurate representation of the data that is in the actual CSV files.
please post some sample rows from your two CSV files. You should obscure the actual data if it is sensitive. However, it should be an accurate representation of the data that is in the actual CSV files.
ASKER
see the attachment... I am inclined to go with powershell on this since it seems lightweight and I don't want to have to build an entire application...just haven't used it before
example1.csv
example2.csv
example1.csv
example2.csv
@soostibi
1. you need to use the -Header parameter when doing your CSV-Import function
2. you need to add something, perhaps a -differenceobject parameter, that will compare the imported data for differences. Even then, I think you will need two of such comparisons, swapping the left/right order of the CSV objects.
1. you need to use the -Header parameter when doing your CSV-Import function
2. you need to add something, perhaps a -differenceobject parameter, that will compare the imported data for differences. Even then, I think you will need two of such comparisons, swapping the left/right order of the CSV objects.
Yes, I need the header, but I do not need any other thing.
The output:
accountnumber
-------------
70002300
40006700
10011400
10022800
The output:
accountnumber
-------------
70002300
40006700
10011400
10022800
PS C:\Windows\system32> $csv1= Import-Csv C:\ee\example1.csv -Header (("c1","c2","accountnumber") + (1..6 | %{"d$_"}))
$csv2= Import-Csv C:\ee\example2.csv -Header (("e1","accountnumber") + (1..11 | %{"f$_"}))
Compare-Object $csv1 $csv2 -Property accountnumber | Select-Object -Property accountnumber
And you can also export the result by this:
Compare-Object $csv1 $csv2 -Property accountnumber | Select-Object -Property accountnumber | Export-Csv -notypeinformation -path c:\result.csv
@soostibi
the problem, as stated, is to find the differences between the two columns of accountnumber values
the problem, as stated, is to find the differences between the two columns of accountnumber values
here's a VBS solution that should do the job. Save as a VBS and adjust the file paths at the top as needed.
Option Explicit
' Define Needed constants
Const cForReading = 1
Const cForWriting = 2
Const cInFile1 = "X:\EE\EE26910135\example1.csv"
Const cInFile2 = "X:\EE\EE26910135\example2.csv"
Const cOutFile = "X:\EE\EE26910135\missing.csv"
Const cFieldNum1 = 3
Const cFieldNum2 = 2
' Define needed variables
Dim oFSO, oOutFile, dCount, sLine
' Create file system object
Set oFSO = CreateObject("Scripting.FileSystemObject")
' Create dictionary to accumulate counts in
Set dCount = CreateObject("Scripting.Dictionary")
' Process two incoming CSV files
LoadFile cInFile1, cFieldNum1
LoadFile cInFile2, cFieldNum2
' Open output file, print header lines
Set oOutFile = oFSO.OpenTextFile(cOutFile, cForWriting, True)
oOutFile.WriteLine "AccountNumber"
oOutFile.WriteLine "-------------"
' Process each entry in counts dictionary, and output count
For Each sLine In dCount
If dCount.Item(sLine) = 1 oOutFile.WriteLine sLine
Next
' Close output file
oOutFile.Close
Sub LoadFile (sFile, iField)
Dim oFile, sLine, aLine, sKey
' Open file for reading
Set oFile = oFSO.OpenTextFile(sFile, cForReading)
' Read input lines, accumulate counts in dictionary for this account
do Until oFile.AtEndOfStream
sLine = oFile.Readline
aLine = Split(sLine, ",")
sKey = aLine(iField-1)
If dCount.Exists(sKey) Then
dCount.Item(sKey) = dCount.Item(sKey) + 1
Else
dCount.Add sKey, 1
End If
Loop
oFile.Close
End Sub
~bp
I do not exactly see what output you exactly expect, but here is another try:
accountnumber file
------------- ----
70002300 example2.csv
40006700 example2.csv
10011400 example1.csv
10022800 example1.csv
accountnumber file
------------- ----
70002300 example2.csv
40006700 example2.csv
10011400 example1.csv
10022800 example1.csv
$file1 = "C:\ee\example1.csv"
$file2 = "C:\ee\example2.csv"
$csv1= Import-Csv -Path $file1 -Header (("c1","c2","accountnumber") + (1..6 | %{"d$_"}))
$csv2= Import-Csv -path $file2 -Header (("e1","accountnumber") + (1..11 | %{"f$_"}))
Compare-Object $csv1 $csv2 -Property accountnumber | select-object -Property accountnumber, @{n="file";e={
if($_.sideindicator -eq "=>"){split-path $file2 -Leaf}else{split-path $file1 -Leaf}
}}
@soostibi
edit one or both CSV files, adding values from the other list and reordering the list values. What mtnseeker needs is to ONLY see those values that are not both lists.
edit one or both CSV files, adding values from the other list and reordering the list values. What mtnseeker needs is to ONLY see those values that are not both lists.
This exactly what my list is: those values that are not in both list.
My comment at 'Date:25/03/11 12:16 PM ' exactly resulted in 'ONLY those values that are not both lists'. What was the problem with that then? I just do not understand...
My comment at 'Date:25/03/11 12:16 PM ' exactly resulted in 'ONLY those values that are not both lists'. What was the problem with that then? I just do not understand...
In my test of you PS script and my reordered and partially-same values, the same values were output.
ASKER
Looking at the VBS Script from @billprew..
This doesn't seem to compile, receiving a "Expected Then" statement. Suggestions on that script?
This doesn't seem to compile, receiving a "Expected Then" statement. Suggestions on that script?
accountnumbers of CSV1:
-------------
10011400
10022800
10041600
accountnumbers of CSV2:
-------------
70002300
40006700
10041600
Only the last numbers in both CSV are the same, so - if I understand correctly the request - I have to output the first 2 numbers from both CSVs, because they do not match. And this is exactly what I did. These numbers are the different ones.
-------------
10011400
10022800
10041600
accountnumbers of CSV2:
-------------
70002300
40006700
10041600
Only the last numbers in both CSV are the same, so - if I understand correctly the request - I have to output the first 2 numbers from both CSVs, because they do not match. And this is exactly what I did. These numbers are the different ones.
then I must have done something wrong with my tests
So consider this again:
$csv1= Import-Csv C:\ee\example1.csv -Header (("c1","c2","accountnumber") + (1..6 | %{"d$_"}))
$csv2= Import-Csv C:\ee\example2.csv -Header (("e1","accountnumber") + (1..11 | %{"f$_"}))
Compare-Object $csv1 $csv2 -Property accountnumber | Select-Object -Property accountnumber |
Export-Csv -notypeinformation -path c:\result.csv
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
@billprew
how would I concatenate the AccountNumber with the respective csv, so if it is csv1 it specifies accountnumber, example1 on the output?
I've been trying to add it in the for each, but I don't think I've got it quite right
how would I concatenate the AccountNumber with the respective csv, so if it is csv1 it specifies accountnumber, example1 on the output?
I've been trying to add it in the for each, but I don't think I've got it quite right
How about this?
Option Explicit
' Define Needed constants
Const cForReading = 1
Const cForWriting = 2
Const cInFile1 = "X:\EE\EE26910135\example1.csv"
Const cInFile2 = "X:\EE\EE26910135\example2.csv"
Const cOutFile = "X:\EE\EE26910135\missing.csv"
Const cFieldNum1 = 3
Const cFieldNum2 = 2
' Define needed variables
Dim oFSO, oOutFile, dCount, sLine
' Create file system object
Set oFSO = CreateObject("Scripting.FileSystemObject")
' Create dictionary to accumulate counts in
Set dCount = CreateObject("Scripting.Dictionary")
' Process two incoming CSV files
LoadFile cInFile1, cFieldNum1
LoadFile cInFile2, cFieldNum2
' Open output file, print header lines
Set oOutFile = oFSO.OpenTextFile(cOutFile, cForWriting, True)
oOutFile.WriteLine "AccountNumber"
oOutFile.WriteLine "-------------"
' Process each entry in counts dictionary, and output count
For Each sLine In dCount
If dCount.Item(sLine) <> "BOTH" Then oOutFile.WriteLine sLine & " " & dCount.Item(sLine)
Next
' Close output file
oOutFile.Close
Sub LoadFile (sFile, iField)
Dim oFile, oGetFile, sLine, aLine, sKey
' Open file for reading
Set oFile = oFSO.OpenTextFile(sFile, cForReading)
set oGetFile = oFSO.GetFile(sFile)
' Read input lines, accumulate counts in dictionary for this account
do Until oFile.AtEndOfStream
sLine = oFile.Readline
aLine = Split(sLine, ",")
sKey = aLine(iField-1)
If dCount.Exists(sKey) Then
dCount.Item(sKey) = "BOTH"
Else
dCount.Add sKey, oGetFile.Name
End If
Loop
oFile.Close
End Sub
~bp
Dear mtnseeker,
Why did you totally ignored my solutions? You requested PowerShell, and I gave you two solutions, one with the all the different AccountNumbers indicating which CSV they ceme from, and a second one with only the non matching Account numbers...
Why did you totally ignored my solutions? You requested PowerShell, and I gave you two solutions, one with the all the different AccountNumbers indicating which CSV they ceme from, and a second one with only the non matching Account numbers...
ASKER
I understood the other solution a bit better and it was just easier for me to make modifications I needed such as adding a email send feature. Originally I thought powershell would be the way to go but I needed something famailar to me and the vbs script worked great. Thank you for your solution and I probably should start using powershell a bit more.
thanks again all,
Kevin
thanks again all,
Kevin
CSV1:
Col1, Col2, Col3
a,a,a
b,b,b
c,c,c
CSV2:
Col1, Col2
x,a
y,b
z,d
What would you like to see in the output?
c,c,c?
z,d?
Both?
Do I get the problem right?