Matching Disparate Data Rows - Excel? Access? - Fuzzy Logic?

I have been given two spreadsheets -- customer lists for two companies.  My task is to come up with a list of customers that match.

1. One might say "ACME, Inc." and the other might say "acme incorporated"
2. I have the same "upper/lower/punctuation/abbreviation" issues for the address as well like:

ACME, Inc. - 123 Privat Drive
Acme - SE 123 Privat Dr.
ACME Incorporated - 17 Center St
Acme Inc - 17 CENTER STREET
ACME - P.O.Box 77
ACME - POB 77
ACME - PO box 77
ACME - p.o. Box  77

I tried using a vlookup, but I couldn't figure out how to use fuzzy logic, like with a LIKE operator.

So I imported the spreadsheets into Access, and ran a query that says:

SELECT * FROM sheet1 s1, sheet2 s2
WHERE s1.name LIKE s2.name
AND s1.addr LIKE s2.addr

And it came up with 16 records out of 60,000.  I have GOT to figure out a fuzzier way to get better results.  

I tried:
WHERE instr(s2.addr, left(s1.addr,8)) > 0
But it matched all of the P.O. BOX numbers with each other, and the ATTN: ACCOUNTS PAYABLE with each other.

So I came to Experts Exchange to see if I can find anybody with ingenuity and creativity that might spark more than 16 records out of 60,000.  In the mean time, I'll just keep trying things.
AptDevAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

dmitryz6Commented:
Access, try this

SELECT a.*,b.*
FROM (select sheet1.*,left([name],4) as connectname from sheet1) a INNER JOIN (select sheet2.*,left([name],4) as connectname from sheet2)  b ON a.connectname = b.connectname
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Arthur_WoodCommented:
a LIKE query (LIKE in the Where clause) uses the syntax

Field1 LIKE "*" & SomeCharacters & "*"

NOT  

LIKE Field1 LIKE field2

A LIKE query uses 'wildcards' (* and _) to look for 'close' matches.  The * will match on ANY number of characters, the _ matches on a SINGLE character at that position in the text.

So, if you wanted to find all the companies whose names started with acme

Select * from Companies where CompanyName like "acme*"

or those that had acme any where in the company name:


Select * from Companies where CompanyName like "*acme*"

or those that ENDED with acme (??)

Select * from Companies where CompanyName like "*acme"

or those that started with A..E

Select * from Companies where CompanyName like "a__e*"

now if you wanted to JOIN to tables with a 'fuzzy' join:

Select * from Companies JOIN MyList on Companies.CompanyName LIKE Left(MyList.ComanyName,4) & "*"

which would join on a MATCH on the first 4 characters in the CompanyName field in the MyList Table.

AW
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Databases

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.