?
Solved

Matching Disparate Data Rows - Excel? Access? - Fuzzy Logic?

Posted on 2006-04-21
2
Medium Priority
?
652 Views
Last Modified: 2012-06-27
I have been given two spreadsheets -- customer lists for two companies.  My task is to come up with a list of customers that match.

1. One might say "ACME, Inc." and the other might say "acme incorporated"
2. I have the same "upper/lower/punctuation/abbreviation" issues for the address as well like:

ACME, Inc. - 123 Privat Drive
Acme - SE 123 Privat Dr.
ACME Incorporated - 17 Center St
Acme Inc - 17 CENTER STREET
ACME - P.O.Box 77
ACME - POB 77
ACME - PO box 77
ACME - p.o. Box  77

I tried using a vlookup, but I couldn't figure out how to use fuzzy logic, like with a LIKE operator.

So I imported the spreadsheets into Access, and ran a query that says:

SELECT * FROM sheet1 s1, sheet2 s2
WHERE s1.name LIKE s2.name
AND s1.addr LIKE s2.addr

And it came up with 16 records out of 60,000.  I have GOT to figure out a fuzzier way to get better results.  

I tried:
WHERE instr(s2.addr, left(s1.addr,8)) > 0
But it matched all of the P.O. BOX numbers with each other, and the ATTN: ACCOUNTS PAYABLE with each other.

So I came to Experts Exchange to see if I can find anybody with ingenuity and creativity that might spark more than 16 records out of 60,000.  In the mean time, I'll just keep trying things.
0
Comment
Question by:AptDev
2 Comments
 
LVL 19

Accepted Solution

by:
dmitryz6 earned 1600 total points
ID: 16509450
Access, try this

SELECT a.*,b.*
FROM (select sheet1.*,left([name],4) as connectname from sheet1) a INNER JOIN (select sheet2.*,left([name],4) as connectname from sheet2)  b ON a.connectname = b.connectname
0
 
LVL 44

Assisted Solution

by:Arthur_Wood
Arthur_Wood earned 400 total points
ID: 16509675
a LIKE query (LIKE in the Where clause) uses the syntax

Field1 LIKE "*" & SomeCharacters & "*"

NOT  

LIKE Field1 LIKE field2

A LIKE query uses 'wildcards' (* and _) to look for 'close' matches.  The * will match on ANY number of characters, the _ matches on a SINGLE character at that position in the text.

So, if you wanted to find all the companies whose names started with acme

Select * from Companies where CompanyName like "acme*"

or those that had acme any where in the company name:


Select * from Companies where CompanyName like "*acme*"

or those that ENDED with acme (??)

Select * from Companies where CompanyName like "*acme"

or those that started with A..E

Select * from Companies where CompanyName like "a__e*"

now if you wanted to JOIN to tables with a 'fuzzy' join:

Select * from Companies JOIN MyList on Companies.CompanyName LIKE Left(MyList.ComanyName,4) & "*"

which would join on a MATCH on the first 4 characters in the CompanyName field in the MyList Table.

AW
0

Featured Post

Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Creating a Cordova application which allow user to save to/load from his Dropbox account the application database.
Creating a Cordova application which allow user to save to/load from his Dropbox account the application database.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question