bndit
asked on
Compare-Object giving two different results...why?
I have two CSV files and each of them has multiple columns. Here's an example
# File 1
#
"EMPLOYEE_NUMBER","FIRST_N AME","LAST _NAME","ZI P_CODE"
1,"Sarah","Peters",92856
2,"Karina","Enriquez",9872 3
3,"Megan","Ryan",23455
4,"Tom","Penn",23562
5,"Leticia","Sanchez",3566 2
# File 2
"EMPLOYEE_NUMBER","FIRST_N AME","LAST _NAME","DE PT","Title "
1,"Sarah","Peters","Accoun ting","Rep "
2,"Karina","Enriquez","Fin ance","Sup ervisor"
3,"Megan","Ryan","IT","Man ager"
4,"Tom","Penn","Ops","Gene ral Mgr"
6,"John","Chen","Finance", "Rep"
Here are my problems:
1. I use Compare-Object $a $b but get no output back even though each file contains a record that the other does not
2. I use Compare-Object $a $b -IncludeEqual and I get an output...however, the record that only exists in the file shows as being equal between the two objects...doesnt matter if I use $a $b or $b $a...same result.....could this be because each file has multiple columns?? what key is Compare-Object doing the comparison if I dont specify using -Property?
3. However, when I use Compare-Object $a $b -Property EMPLOYEE_NUMBER -IncludeEquals...I get what the output that I expect less the additional columns..but nonetheless I get what I should be getting back.
What am I missing here?
file1-file2.png
file2-file1.png
# File 1
#
"EMPLOYEE_NUMBER","FIRST_N
1,"Sarah","Peters",92856
2,"Karina","Enriquez",9872
3,"Megan","Ryan",23455
4,"Tom","Penn",23562
5,"Leticia","Sanchez",3566
# File 2
"EMPLOYEE_NUMBER","FIRST_N
1,"Sarah","Peters","Accoun
2,"Karina","Enriquez","Fin
3,"Megan","Ryan","IT","Man
4,"Tom","Penn","Ops","Gene
6,"John","Chen","Finance",
Here are my problems:
1. I use Compare-Object $a $b but get no output back even though each file contains a record that the other does not
2. I use Compare-Object $a $b -IncludeEqual and I get an output...however, the record that only exists in the file shows as being equal between the two objects...doesnt matter if I use $a $b or $b $a...same result.....could this be because each file has multiple columns?? what key is Compare-Object doing the comparison if I dont specify using -Property?
3. However, when I use Compare-Object $a $b -Property EMPLOYEE_NUMBER -IncludeEquals...I get what the output that I expect less the additional columns..but nonetheless I get what I should be getting back.
What am I missing here?
file1-file2.png
file2-file1.png
ASKER
@soostibi
Thanks for the reply...I'm still trying to wrap my head about your comment regarding 'objects' and 'collections'...kinda makes sense. Ok, so if it's better to specify the properties to do the comparison on...that creates a totally different problem for me because I cannot do this:
Compare-Object $a $b -Property EMPLOYEE_NUMBER, FIRST_NAME, LAST_NAME -IncludeEqual
The reason is because the *only* key that I want to use is EMPLOYEE_NUMBER because FIRST_NAME and/or LAST_NAME may have different values in both files for the same person (William Jones same as Bill Jones...however, both of those records have the same EMPLOYEE_NUMBER). The problem with using EMPLOYEE_NUMBER is that all the other columns disappear...for example, if I use this:
Compare-Object $a $b -Property EMPLOYEE_NUMBER -IncludeEqual
My output is a list of EMPLOYEE_NUMBERS...without the FIRST_NAME, LAST_NAME, DEPT, TITLE, etc.
Also, I noticed you're doing the assignment for each file...and obviously these are sample files...in reality my files are large files and I'm guessing this technique wouldnt work in that case....what other suggestions do you have for me to do the following:
1) Import the CSV file so that they can be compared as "collections" vs "objects"
2) Compare both CSV files using the EMPLOYEE_NUMBER column
3) Output a complete CSV file including all column headers
Thanks for the reply...I'm still trying to wrap my head about your comment regarding 'objects' and 'collections'...kinda makes sense. Ok, so if it's better to specify the properties to do the comparison on...that creates a totally different problem for me because I cannot do this:
Compare-Object $a $b -Property EMPLOYEE_NUMBER, FIRST_NAME, LAST_NAME -IncludeEqual
The reason is because the *only* key that I want to use is EMPLOYEE_NUMBER because FIRST_NAME and/or LAST_NAME may have different values in both files for the same person (William Jones same as Bill Jones...however, both of those records have the same EMPLOYEE_NUMBER). The problem with using EMPLOYEE_NUMBER is that all the other columns disappear...for example, if I use this:
Compare-Object $a $b -Property EMPLOYEE_NUMBER -IncludeEqual
My output is a list of EMPLOYEE_NUMBERS...without
Also, I noticed you're doing the assignment for each file...and obviously these are sample files...in reality my files are large files and I'm guessing this technique wouldnt work in that case....what other suggestions do you have for me to do the following:
1) Import the CSV file so that they can be compared as "collections" vs "objects"
2) Compare both CSV files using the EMPLOYEE_NUMBER column
3) Output a complete CSV file including all column headers
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
@soostibi
Sorry for the delay in getting back to you...You were right, the Compare-Object was starting to be a pain to accomplish what I wanted to do...for a moment I thought I had figured out how to use it like this
Compare-Object $a $b -Property EMPLOYEE_NUMBER -IncludeEqual -SyncWindow 1000 -PassThru
The above line gave me the entire obj ($a) along with the SideIndicator...from there I still had to do a lot of cleaning...
I tested your solution and I like it much better...much more flexible. With that said, could you help me understand what your code is doing so that I can wrap my head around this concept of passing the entire objects, customized objects...etc? Thanks!!
Sorry for the delay in getting back to you...You were right, the Compare-Object was starting to be a pain to accomplish what I wanted to do...for a moment I thought I had figured out how to use it like this
Compare-Object $a $b -Property EMPLOYEE_NUMBER -IncludeEqual -SyncWindow 1000 -PassThru
The above line gave me the entire obj ($a) along with the SideIndicator...from there I still had to do a lot of cleaning...
I tested your solution and I like it much better...much more flexible. With that said, could you help me understand what your code is doing so that I can wrap my head around this concept of passing the entire objects, customized objects...etc? Thanks!!
$a = import-csv c:\firstcsv.csv # import CSVs as Collection of Objects, that means all the rows become objects with properties of the columns of the CSV
$b = import-csv c:\secondcsv.csv
$h = @{} # initialize an empty hashtable. This is an ideal data type to search and compare, it has label (key) and value pairs
$a | %{$h.($_.employee_number) = @($_,"")} # fill into the hashtable all the objects of the first CSV, the label (key) will be the employee_number, the value will be an array with two element, the first element is the whole object, the second is an empty string, this will be a placeholder for the matching object from the second CSV if it exists
$b | %{ # do the same with the second CSV
if(!$h.($_.employee_number )){$h.($_. employee_n umber) = @("",$_)} # the difference is, if this is a new account_number that is not already in the hashtable, then create it, with a two-element array, but now the first is empty string, the second is the object
else{$h.($_.employee_numbe r)[1] = $_} # if it already exits, put the object as the second element (index 1) into the array.
}
$h.keys | Sort-Object | %{ # sort all the labels (keys), actually these are the account_numbers
New-Object -TypeName PSObject -Property @{ # create a new object for every account_number with all the neccessaryproperties
Employee_Number = $_ # grab the values for these properties from the properties of the hashtable elements
FIRST_NAME_file1 = ($h.$_)[0].First_Name
LAST_NAME_File1 = ($h.$_)[0].Last_Name
ZIP_CODE = ($h.$_)[0].Zip_Code
FIRST_NAME_File2 = ($h.$_)[1].First_Name
LAST_NAME_File2 = ($h.$_)[1].Last_Name
DEPT = ($h.$_)[1].Dept
Title = ($h.$_)[1].Title
}
} | Select-Object -Property Employee_Number, FIRST_NAME_file1, LAST_NAME_File1, FIRST_NAME_File2, LAST_NAME_File2, ZIP_CODE, DEPT, Title | # the ordering of the columns of the output of the new-object in unpredictable, so I re-select the columns to a sensible order
export-csv c:\totalcsv.csv -notypeinformation # and export this into a new CSV
$b = import-csv c:\secondcsv.csv
$h = @{} # initialize an empty hashtable. This is an ideal data type to search and compare, it has label (key) and value pairs
$a | %{$h.($_.employee_number) = @($_,"")} # fill into the hashtable all the objects of the first CSV, the label (key) will be the employee_number, the value will be an array with two element, the first element is the whole object, the second is an empty string, this will be a placeholder for the matching object from the second CSV if it exists
$b | %{ # do the same with the second CSV
if(!$h.($_.employee_number
else{$h.($_.employee_numbe
}
$h.keys | Sort-Object | %{ # sort all the labels (keys), actually these are the account_numbers
New-Object -TypeName PSObject -Property @{ # create a new object for every account_number with all the neccessaryproperties
Employee_Number = $_ # grab the values for these properties from the properties of the hashtable elements
FIRST_NAME_file1 = ($h.$_)[0].First_Name
LAST_NAME_File1 = ($h.$_)[0].Last_Name
ZIP_CODE = ($h.$_)[0].Zip_Code
FIRST_NAME_File2 = ($h.$_)[1].First_Name
LAST_NAME_File2 = ($h.$_)[1].Last_Name
DEPT = ($h.$_)[1].Dept
Title = ($h.$_)[1].Title
}
} | Select-Object -Property Employee_Number, FIRST_NAME_file1, LAST_NAME_File1, FIRST_NAME_File2, LAST_NAME_File2, ZIP_CODE, DEPT, Title | # the ordering of the columns of the output of the new-object in unpredictable, so I re-select the columns to a sensible order
export-csv c:\totalcsv.csv -notypeinformation # and export this into a new CSV
ASKER
@soostibi
Great explanation...just one thing...if my CSV files have a lot of columns...say 20+...I'd have to list them all (assuming that I wanted all of them and not just a select few) when I create the new object for every account_number, right?
New-Object -TypeName PSObject -Property @{ # create a new object for every account_number with all the neccessaryproperties
Employee_Number = $_ # grab the values for these properties from the properties of the hashtable elements
FIRST_NAME_file1 = ($h.$_)[0].First_Name
LAST_NAME_File1 = ($h.$_)[0].Last_Name
ZIP_CODE = ($h.$_)[0].Zip_Code
FIRST_NAME_File2 = ($h.$_)[1].First_Name
LAST_NAME_File2 = ($h.$_)[1].Last_Name
DEPT = ($h.$_)[1].Dept
Title = ($h.$_)[1].Title
}
Great explanation...just one thing...if my CSV files have a lot of columns...say 20+...I'd have to list them all (assuming that I wanted all of them and not just a select few) when I create the new object for every account_number, right?
New-Object -TypeName PSObject -Property @{ # create a new object for every account_number with all the neccessaryproperties
Employee_Number = $_ # grab the values for these properties from the properties of the hashtable elements
FIRST_NAME_file1 = ($h.$_)[0].First_Name
LAST_NAME_File1 = ($h.$_)[0].Last_Name
ZIP_CODE = ($h.$_)[0].Zip_Code
FIRST_NAME_File2 = ($h.$_)[1].First_Name
LAST_NAME_File2 = ($h.$_)[1].Last_Name
DEPT = ($h.$_)[1].Dept
Title = ($h.$_)[1].Title
}
Here is a generalized version, with any number of columns. The only criteria that there must be one common column: employee_number.
But please, do not ask for an explanation ! :-)
But please, do not ask for an explanation ! :-)
$a = import-csv c:\ee\firstcsv.csv
$b = import-csv c:\ee\secondcsv.csv
$h = @{}
$props = @()
$a | Get-Member -MemberType properties | Select-Object -ExpandProperty name | %{if($_ -eq "employee_number"){$props += "$_" }else{$props += "$_-File1"}}
$b | Get-Member -MemberType properties | Select-Object -ExpandProperty name | %{if($_ -ne "employee_number"){$props += "$_-File2"}}
$a | %{
$o = @{}
$r = $_
$r | Get-Member -MemberType properties | Select-Object -ExpandProperty name | %{
if($_ -eq "employee_number"){$o.$_ = $r.$_}
else{$o.("$_-File1") = $r.$_}
}
$h.($r.employee_number) = $o
}
$b | %{
$o = @{}
$r = $_
$r | Get-Member -MemberType properties | Select-Object -ExpandProperty name | %{
if($_ -eq "employee_number"){$o.$_ = $r.$_}
else{$o.("$_-File2") = $r.$_}
}
if(!$h.($r.employee_number)){$h.($r.employee_number) = $o}
else{
$o.keys | %{
if($_ -ne "employee_number"){
$h.($r.employee_number).$_ = $o.$_
}
}
}
}
$h.keys | Sort-Object | %{
New-Object -TypeName PSObject -Property $h.$_
} | Select-Object -Property ("employee_number", ($props | ?{$_ -ne "employee_number"} | Sort-Object) | %{$_} | %{"$_"}) |
export-csv c:\ee\totalcsv.csv -notypeinformation -Encoding utf8 -UseCulture
ASKER
@soostibi
haha...fair enough...You've helped me a lot and I increased the points to 500...thank you for your help.
haha...fair enough...You've helped me a lot and I increased the points to 500...thank you for your help.
ASKER
Excellent answer!
If you convert one element from a collection made from a csv you'll get nothing. That is why it can not detect the differences correctly.
So you should specify the properies.
Of course it can compare collections of strings and numbers without problem, without specifying properties (numbers do not have properties). It can also compare object that are converted to some descriptive string, without specifying the properties to compare, but this is not true for custom objects, such as a CSV.
This case you have to specify the common properties: employee_number, first_name, Last_name
Open in new window