Powershell, comparing data in 2 files and creating a 3rd with the matches

tilbard
tilbard used Ask the Experts™
on
I have 2 files in .CSV format. The first is a list of locations as entered into our HR Database, and the second is a list of locations entered into Active Directory. I'd absolutely love it if I could compare these 2 lists and export the matches to a 3rd csv, essentially creating a relationship map. The problem I'm having is that only a small portion of the names match between the 2 lists (see below for a small portion for example). I've tried reading both lists into variables and recursing through each field and doing a -like comparison, but they're too dissimilair to trigger a match.

In the example below, the first line from each list should map, and both fields be exported to a new CSV. The lists aren't terribly big, so speed optimization is not a big deal.

List 1:
DD-Thayer St, Millville
DD-Thoreau Rd, Northboro
DD-Underwood Ave, Framingham
DD-Warren Ave, Marlboro
DD-Warren Ave, Marlboro
DD-Warren Rd, Framingham
DD-Westhill Park, Natick
DD-Westminister St, Bellingham
DD-William J Heights
DD-Wilson St, Natick
DD-Winthrop St, Framingham
DS-Bring to Light Candles
DS-Choices Program
DS-Day Supports Division
DS-DayHab Wrap
DS-Eliot St #250, Ashland DAY HAB
DS-Eliot St #290, Ashland DAY HAB
DS-Employment & Transportation
DS-MRC 280
DS-MRC NE Voc Rehab
DS-SES/LUSA
MH-16 Lincoln St, Hudson
MH-16R Gordon St, Framingham
MH-19 West St, Newton
MH-190 Milk St
MH-207 Lincoln St
MH-76 Clark St.

List 2:
57 Thayer St- Millville
Prospect-st
High Rock Framingham
6 - 8 Claflin St - Framingham
6 Warren Ave
Dedham
655BostonPost
675 Main St - Waltham Office
6-8 Claflin St - Framingham
693 Concord St Outreach
7 Oakland St - Lexington - Douglas House
7 Wellington Ave
Sterling-pl
76_Clark_Street_Framingham
8 Irving St - Hudson
85 Alexander Street- Framingham
Grace Circle
93 Cherry St - Framingham
A & F
A and F Supervisors
A&F Budget
ACC Brandon
ACC_Clinical_Supervi
ACC Community Justice
ACC Deaf Services
ACC External Access Group
ACC Framingham_Outpatien
ACC Leadership
ACC Marlboro
ACC Newsletter
BilingualTeam
ACC_Outpatient
ACC Preschool MH
ACC Senior Support Services
Southside Clinic
ACC Utilization Review
ACC-Waltham Outpatient
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
PowerShell Developer
Top Expert 2010
Commented:

I can make it match the first one after I rip out all the punctuation from each list.

It should be possible to tweak it to capture Warren Ave above, just having a bit of bother getting the RegEx to strip off the opening number from $List2.

Still, it's really very limited.

Chris
$List1 = Get-Content "Test1.txt" # List 1 above
$List2 = Get-Content "Test2.txt" # List 2 above
 
# Drop the value prior to the hyphen and all punctuation except ampersand
$List1 = $List1 | %{ ($_ -Replace ".*-|[^\w^\s^\s^/&]").Trim() }
 
# Drop any preceeding house / building number and all punctuation except ampersand
$List2 = $List2 | %{ ($_ -Replace "[^\w^\s^\s^/&]").Trim() }
 
# First match run $List1 vs $List2
ForEach ($Entry1 in $List1) {
  ForEach ($Entry2 in $List2) {
    If ($Entry1.Contains($Entry2) -Or $Entry2.Contains($Entry1)) {
      "Matched $Entry1 to $Entry2"
    }
  }
}

Open in new window

Author

Commented:
Yeah I was kind of afraid it was going to be more work than it was really worth. It would be one thing if the list changed frequently, but seeing as it's fairly static, I think I might be best served by just doing it manually.

Thanks for the attempt!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial