[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

Use PowerShell to merge info from two different CSV files into a new CSV file

Posted on 2010-08-27
13
Medium Priority
?
4,776 Views
Last Modified: 2012-05-10
I have a CSV file that contains this information (approx 10k entries)
EMPLOYEE_NUMBER, FIRST_NAME,MIDDLE_NAME,LAST_NAME,TITLE (some other columns..)
12345,John,K.,Doe,Manager
23425,Jane,,Doe,Director
23434,James,L.,Doe,Clerk
44444,David,,Smith,Sales Rep
,Robert,Jones,Sales Rep

I have a second CSV file that contains this information (approx 50 entries)
Name,NTAccountName,employeeNumber
John K. Doe,domain\John.Doe,12345
Jane Doe, domain\Jane.Doe,23425,

Goal: to merge the employee's title information from the first CSV file with the second file. So the output should be this:

Name,NTAccountName,employeeNumber,Title,
John K. Doe,domain\John.Doe,12345,Manager
Jane Doe, domain\Jane.Doe,23425,Director

I found on the Internet, but I'm having a hard time tweaking it to do what I want. Here's the post that I found. Since I have a common field in both CSV, I'd think I could use the employeeNumber to do the match.

http://powergui.org/thread.jspa?threadID=6773
# Author: Shay Levy
# http://powergui.org/thread.jspa?threadID=6773

$csv1 = import-csv D:\Scripts\temp\logonInfo1.csv
$csv2 = import-csv D:\Scripts\temp\logonInfo2.csv

$result1 = $(compare-object $csv1 $csv2 -IncludeEqual | where {$_.SideIndicator -eq "=="} | foreach {$_.InputObject})
$result2 = $(compare-object $csv2 $csv1 -IncludeEqual | where {$_.SideIndicator -eq "=="} | foreach {$_.InputObject})

$members1 = $csv1 | gm -mem  NoteProperty |  foreach {$_.name}
$members2 = $csv2 | gm -mem  NoteProperty |  foreach {$_.name}

$newMembers = $(compare-object $members1 $members2 | where {$_.SideIndicator -eq "=>"}  | foreach {$_.InputObject})

for($i=0; $i -lt $newMembers.length; $i++){
 $prop = $newMembers[$i]
 $result1 | add-member noteproperty $prop $null
 
 for($x=0; $x -lt $result1.length; $x++){
  $newMembers | foreach {
   $result1[$x].$prop = $result2[$x].$prop
  }
 }

}

$result1 | export-csv D:\Scripts\temp\new.csv -NoTypeInformation
get-content D:\Scripts\temp\new.csv

Open in new window

0
Comment
Question by:bndit
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 5
  • 2
13 Comments
 
LVL 2

Author Comment

by:bndit
ID: 33547607
Here's another possible solution (again from Shay), but it's not working for me

http://www.powergui.org/thread.jspa?threadID=10104

0
 
LVL 71

Accepted Solution

by:
Chris Dent earned 500 total points
ID: 33548599
This should work.

Chris
$CSV1 = Import-Csv "First.csv"
$CSV2 = Import-Csv "Second.csv"

# Since the second is smaller we will loop through the first once and use Where-Object
# to find entries in the second

$CSV1 | ForEach-Object {

  # Store the EmployeeNumber from the first CSV
  $EmployeeNumber = $_.EMPLOYEE_NUMBER

  # And pull out the title field
  $Title = $_.TITLE

  # Find the corresponding entry in CSV 2 and prepare some output
  # Making a new field for Title and filling it with the title from CSV1.

  $CSV2 | 
    Where-Object { $_.employeeNumber -eq $EmployeeNumber } |
    Select-Object Name, NTAccountName, EmployeeNumber, 
      @{n='Title';e={ $Title }}

# Export everything to a new CSV file
} | Export-Csv "NewFile.csv"

Open in new window

0
 
LVL 13

Expert Comment

by:soostibi
ID: 33549086
In my solution I make use of the hashtable data structure to find pairs. (Don't give me points is Chris' solution works for you, this is just a brain excercise for me).
(It only works if the second csv only include every employeenumber once.)
$secondinfo = Import-Csv c:\ee\second.csv | Group-Object -Property employeenumber -AsHashTable
Import-Csv c:\ee\first.csv | %{
	$title = $_.title
	$secondinfo.($_.employee_number) | Select-Object name, ntaccountname, employeenumber, 
		@{n="Title";e={$Title}}
} | Export-Csv c:\ee\new.csv -NoTypeInformation

Open in new window

0
Are your AD admin tools letting you down?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

 
LVL 71

Expert Comment

by:Chris Dent
ID: 33549918

I think the HashTable is well worth introducing, I'd have gone down the same path if both files were large. My version doesn't scale anywhere near as well as a comparison against a Hashtable will.

Chris
0
 
LVL 2

Author Comment

by:bndit
ID: 33551272
@Chris
Your script works great. However, in the second CSV file there are some entries that do not have an employee number value. I noticed that the script ignores these accounts and are not included in the output file. Is there any way to include them in the merged output file? I'm thinking it'd be some type of if-statement..."If the employee number is empty don't bother looking in the large file, just pass the record to the output file". That'd solve my problem!

@soostibi
I tried your script, but I came across some problems. I'm attaching my script as I ran it and the error that I got.
HT-script.png
HT-error.png
0
 
LVL 13

Expert Comment

by:soostibi
ID: 33552518
In your line 8 you put import-csv twice.
0
 
LVL 13

Expert Comment

by:soostibi
ID: 33552709
If we can not use the employeenumber as a join between the two CSVs, but we can use the full name, then the script can be:

$secondinfo = Import-Csv c:\ee\second.csv | Group-Object -Property name -AsHashTable
Import-Csv c:\ee\first.csv | %{
	$name = $_.first_name + " " + @(if($_.middle_name){$_.middle_name+" "}) + $_.last_name
	$Title = $_.title
	$emplnum = $_.employee_number
	$secondinfo.($name) | Select-Object name, ntaccountname, @{n="employeenumber"; e={$emplnum}}, 
		@{n="Title";e={$Title}}
} | Export-Csv c:\ee\new.csv -NoTypeInformation

Open in new window

0
 
LVL 2

Author Comment

by:bndit
ID: 33554447
@soostibi
Good catch; the issue was the double Import-Csv. The script is now working but I get the same result as with Chris' script; the records without employee number in the second CSV file are ignored and don't show up in the merged file. At this point, the script helps me with the records with employee numbers, but I'd have to manually merge the records without employee number in the second file. Let me tell you more about these files:

File1: This file is an extract directly from an HR database. Thus, all records will match an individual, and this individual will have an employee number.

File2: This file is an extract directly from AD. Thus, not all records will match an individual. In other words, I have generic accounts such as service accounts, etc., which do not have an employee number. This is why the full name is not a good join value for both files. Again, if there was a way for the script to instead of ignoring the records without employee numbers in the second file to just pass them to the merged file, that'd be great.
0
 
LVL 13

Assisted Solution

by:soostibi
soostibi earned 500 total points
ID: 33554543
What about this then?
$firstinfo = Import-Csv c:\ee\first.csv | Group-Object -Property employee_number -AsHashTable
Import-Csv c:\ee\second.csv | 
	Select-Object name, ntaccountname, employeenumber, 
		@{n="title"; e={$firstinfo.($_.employeenumber).item(0).title}} | 
			Export-Csv c:\ee\new.csv -NoTypeInformation

Open in new window

0
 
LVL 2

Author Comment

by:bndit
ID: 33554580
@soostibi
That works! Thanks. Now, could comment the script so that I know the logic a bit better. I kinda get parts of it but I'd like to fully understand it.  Thanks again.
0
 
LVL 13

Expert Comment

by:soostibi
ID: 33554653
1: I import the first csv and convert it to a hashtable at once by the employeenumber attribute. Actually this is a Group-object cmdlet, but as the employeenumber is unique, it actually makes a hashtable that has an element of the csv in each 'row' referenced by the employeenumber. I use this as there is no faster way to do it.
2: I import the second csv and pipe it into a
3: select-object cmdlet, to select the relevant properties and to create a new one, that is
4: the 'title' property, that come from addressing the item from the hashtable by the employeenumber of the actual element of the second csv that comes through the pipeline. As I mentioned, the hashtable is not a simple hashtable because of the group-object, so I have to grab the real information that is wrapped into a parametrized property named 'item'.
5: what borns after this is exported to a new csv.
0
 
LVL 2

Author Comment

by:bndit
ID: 33554660
Awesome, thanks both of you for your help. I think the fair thing to do is for me to split the points as you both helped me solve this problem.
0
 
LVL 2

Author Closing Comment

by:bndit
ID: 33554666
Excellent assistance.
0

Featured Post

Has Powershell sent you back into the Stone Age?

If managing Active Directory using Windows Powershell® is making you feel like you stepped back in time, you are not alone.  For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this post we will be converting StringData saved within a text file into a hash table. This can be further used in a PowerShell script for replacing settings that are dynamic in nature from environment to environment.
In the absence of a fully-fledged GPO Management product like AGPM, the script in this article will provide you with a simple way to watch the domain (or a select OU) for GPOs changes and automatically take backups when policies are added, removed o…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Suggested Courses

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question