Link to home
Start Free TrialLog in
Avatar of bndit
bndit

asked on

Compare-Object giving two different results...why?

I have two CSV files and each of them has multiple columns. Here's an example

# File 1
#
"EMPLOYEE_NUMBER","FIRST_NAME","LAST_NAME","ZIP_CODE"
1,"Sarah","Peters",92856
2,"Karina","Enriquez",98723
3,"Megan","Ryan",23455
4,"Tom","Penn",23562
5,"Leticia","Sanchez",35662

# File 2
"EMPLOYEE_NUMBER","FIRST_NAME","LAST_NAME","DEPT","Title"
1,"Sarah","Peters","Accounting","Rep"
2,"Karina","Enriquez","Finance","Supervisor"
3,"Megan","Ryan","IT","Manager"
4,"Tom","Penn","Ops","General Mgr"
6,"John","Chen","Finance","Rep"

Here are my problems:

1. I use Compare-Object $a $b but get no output back even though each file contains a record that the other does not

2. I use Compare-Object $a $b -IncludeEqual and I get an output...however, the record that only exists in the file shows as being equal between the two objects...doesnt matter if I use $a $b or $b $a...same result.....could this be because each file has multiple columns?? what key is Compare-Object doing the comparison if I dont specify using -Property?

3. However, when I use Compare-Object $a $b -Property EMPLOYEE_NUMBER -IncludeEquals...I get what the output that I expect less the additional columns..but nonetheless I get what I should be getting back.

What am I missing here?


file1-file2.png
file2-file1.png
Avatar of soostibi
soostibi
Flag of Hungary image

Actually Compare-Object compares object (it would be better to call collections) by the properties you specify. If you do not specify any property, it convert the elements in both collections to string, and makes the comparison this way.
If you convert one element from a collection made from a csv you'll get nothing. That is why it can  not detect the differences correctly.

So you should specify the properies.
Of course it can compare collections of strings and numbers without problem, without specifying properties (numbers do not have properties). It can also compare object that are converted to some descriptive string, without specifying the properties to compare, but this is not true for custom objects, such as a CSV.

This case you have to specify the common properties: employee_number, first_name, Last_name
$a = @"
"EMPLOYEE_NUMBER","FIRST_NAME","LAST_NAME","ZIP_CODE"
1,"Sarah","Peters",92856
2,"Karina","Enriquez",98723
3,"Megan","Ryan",23455
4,"Tom","Penn",23562
5,"Leticia","Sanchez",35662
"@ | ConvertFrom-Csv

$b = @"
"EMPLOYEE_NUMBER","FIRST_NAME","LAST_NAME","DEPT","Title"
1,"Sarah","Peters","Accounting","Rep"
2,"Karina","Enriquez","Finance","Supervisor"
3,"Megan","Ryan","IT","Manager"
4,"Tom","Penn","Ops","General Mgr"
6,"John","Chen","Finance","Rep"
"@ | ConvertFrom-Csv

Compare-Object $a $b -Property employee_number, first_name, Last_name

Open in new window

Avatar of bndit
bndit

ASKER

@soostibi
Thanks for the reply...I'm still trying to wrap my head about your comment regarding 'objects' and 'collections'...kinda makes sense. Ok, so if it's better to specify the properties to do the comparison on...that creates a totally different problem for me because I cannot do this:

Compare-Object $a $b -Property EMPLOYEE_NUMBER, FIRST_NAME, LAST_NAME -IncludeEqual

The reason is because the *only* key that I want to use is EMPLOYEE_NUMBER because FIRST_NAME and/or LAST_NAME may have different values in both files for the same person (William Jones same as Bill Jones...however, both of those records have the same EMPLOYEE_NUMBER). The problem with using EMPLOYEE_NUMBER is that all the other columns disappear...for example, if I use this:

Compare-Object $a $b -Property EMPLOYEE_NUMBER -IncludeEqual

My output is a list of EMPLOYEE_NUMBERS...without the FIRST_NAME, LAST_NAME, DEPT, TITLE, etc.

Also, I noticed you're doing the assignment for each file...and obviously these are sample files...in reality my files are large files and I'm guessing this technique wouldnt work in that case....what other suggestions do you have for me to do the following:

1) Import the CSV file so that they can be compared as "collections" vs "objects"
2) Compare both CSV files using the EMPLOYEE_NUMBER column
3) Output a complete CSV file including all column headers
ASKER CERTIFIED SOLUTION
Avatar of soostibi
soostibi
Flag of Hungary image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of bndit

ASKER

@soostibi
Sorry for the delay in getting back to you...You were right, the Compare-Object was starting to be a pain to accomplish what I wanted to do...for a moment I thought I had figured out how to use it like this

Compare-Object $a $b -Property EMPLOYEE_NUMBER -IncludeEqual -SyncWindow 1000 -PassThru

The above line gave me the entire obj ($a) along with the SideIndicator...from there I still had to do a lot of cleaning...

I tested your solution and I like it much better...much more flexible. With that said, could you help me understand what your code is doing so that I can wrap my head around this concept of passing the entire objects, customized objects...etc? Thanks!!
$a = import-csv c:\firstcsv.csv  # import CSVs as Collection of Objects, that means all the rows become objects with properties of the columns of the CSV
$b = import-csv c:\secondcsv.csv  
 
$h = @{}  # initialize an empty hashtable. This is an ideal data type to search and compare, it has label (key) and value pairs
$a | %{$h.($_.employee_number) = @($_,"")}  # fill into the hashtable all the objects of the first CSV, the label (key) will be the employee_number, the value will be an array with two element, the first element is the whole object, the second is an empty string, this will be a placeholder for the matching object from the second CSV if it exists
$b | %{  # do the same with the second CSV
    if(!$h.($_.employee_number)){$h.($_.employee_number) = @("",$_)}  # the difference is, if this is a new account_number that is not already in the hashtable, then create it, with a two-element array, but now the first is empty string, the second is the object
    else{$h.($_.employee_number)[1] = $_}  # if it already exits, put the object as the second element (index 1) into the array.
}  
$h.keys | Sort-Object | %{  # sort all the labels (keys), actually these are the account_numbers
    New-Object -TypeName PSObject -Property @{  # create a new object for every account_number with all the neccessaryproperties
        Employee_Number = $_   # grab the values for these properties from the properties of the hashtable elements
        FIRST_NAME_file1 = ($h.$_)[0].First_Name  
        LAST_NAME_File1  = ($h.$_)[0].Last_Name  
        ZIP_CODE  = ($h.$_)[0].Zip_Code  
        FIRST_NAME_File2 = ($h.$_)[1].First_Name  
        LAST_NAME_File2  = ($h.$_)[1].Last_Name  
        DEPT =  ($h.$_)[1].Dept  
        Title = ($h.$_)[1].Title  
    }  
} | Select-Object -Property Employee_Number, FIRST_NAME_file1, LAST_NAME_File1, FIRST_NAME_File2, LAST_NAME_File2, ZIP_CODE, DEPT, Title |  # the ordering of the columns of the output of the new-object in unpredictable, so I re-select the columns to a sensible order
 export-csv c:\totalcsv.csv -notypeinformation # and export this into a new CSV
Avatar of bndit

ASKER

@soostibi
Great explanation...just one thing...if my CSV files have a lot of columns...say 20+...I'd have to list them all (assuming that I wanted all of them and not just a select few) when I create the new object for every account_number, right?

    New-Object -TypeName PSObject -Property @{  # create a new object for every account_number with all the neccessaryproperties
        Employee_Number = $_   # grab the values for these properties from the properties of the hashtable elements
        FIRST_NAME_file1 = ($h.$_)[0].First_Name  
        LAST_NAME_File1  = ($h.$_)[0].Last_Name  
        ZIP_CODE  = ($h.$_)[0].Zip_Code  
        FIRST_NAME_File2 = ($h.$_)[1].First_Name  
        LAST_NAME_File2  = ($h.$_)[1].Last_Name  
        DEPT =  ($h.$_)[1].Dept  
        Title = ($h.$_)[1].Title  
    }  
Here is a generalized version, with any number of columns. The only criteria that there must be one common column: employee_number.

But please, do not ask for an explanation ! :-)
$a = import-csv c:\ee\firstcsv.csv  
$b = import-csv c:\ee\secondcsv.csv  
  
$h = @{}
$props = @()
$a | Get-Member -MemberType properties | Select-Object -ExpandProperty name | %{if($_ -eq "employee_number"){$props += "$_" }else{$props += "$_-File1"}}
$b | Get-Member -MemberType properties | Select-Object -ExpandProperty name | %{if($_ -ne "employee_number"){$props += "$_-File2"}}
$a | %{  
    $o = @{}
    $r = $_
    $r | Get-Member -MemberType properties | Select-Object -ExpandProperty name | %{
        if($_ -eq "employee_number"){$o.$_ = $r.$_}
        else{$o.("$_-File1") = $r.$_}
    }
    $h.($r.employee_number) = $o
}  
$b | %{  
    $o = @{}
    $r = $_
    $r | Get-Member  -MemberType properties | Select-Object -ExpandProperty name | %{
        if($_ -eq "employee_number"){$o.$_ = $r.$_}
        else{$o.("$_-File2") = $r.$_}
    }
    if(!$h.($r.employee_number)){$h.($r.employee_number) = $o}  
    else{
        $o.keys | %{
            if($_ -ne "employee_number"){
                $h.($r.employee_number).$_ = $o.$_
            }
        }
    }  
}  
$h.keys | Sort-Object | %{  
    New-Object -TypeName PSObject -Property $h.$_  
} | Select-Object -Property ("employee_number", ($props | ?{$_ -ne "employee_number"} | Sort-Object) | %{$_} | %{"$_"}) | 
    export-csv c:\ee\totalcsv.csv -notypeinformation -Encoding utf8 -UseCulture

Open in new window

Avatar of bndit

ASKER

@soostibi
haha...fair enough...You've helped me a lot and I increased the points to 500...thank you for your help.
Avatar of bndit

ASKER

Excellent answer!