Matching Disparate Data Rows - Excel? Access? - Fuzzy Logic?
Posted on 2006-04-21
I have been given two spreadsheets -- customer lists for two companies. My task is to come up with a list of customers that match.
1. One might say "ACME, Inc." and the other might say "acme incorporated"
2. I have the same "upper/lower/punctuation/abbreviation" issues for the address as well like:
ACME, Inc. - 123 Privat Drive
Acme - SE 123 Privat Dr.
ACME Incorporated - 17 Center St
Acme Inc - 17 CENTER STREET
ACME - P.O.Box 77
ACME - POB 77
ACME - PO box 77
ACME - p.o. Box 77
I tried using a vlookup, but I couldn't figure out how to use fuzzy logic, like with a LIKE operator.
So I imported the spreadsheets into Access, and ran a query that says:
SELECT * FROM sheet1 s1, sheet2 s2
WHERE s1.name LIKE s2.name
AND s1.addr LIKE s2.addr
And it came up with 16 records out of 60,000. I have GOT to figure out a fuzzier way to get better results.
WHERE instr(s2.addr, left(s1.addr,8)) > 0
But it matched all of the P.O. BOX numbers with each other, and the ATTN: ACCOUNTS PAYABLE with each other.
So I came to Experts Exchange to see if I can find anybody with ingenuity and creativity that might spark more than 16 records out of 60,000. In the mean time, I'll just keep trying things.