Solved

Identifying (similar) Names and Addresses

Posted on 2014-02-10
2
281 Views
Last Modified: 2014-02-10
Hi,

I have 50,000 names and addresses from multiple sources.

The wil be at least 30% duplication.

I.e. One specific name and address may be there more than once but many NOT be 100% identical.

E.g.
John Smith, 1 High Street, London
John Smith, 1 The High Street London

Can anyone guide me to a utility which would identify names/addresses that are not quite 100% matched.

Any thoughts out there?
0
Comment
Question by:Patrick O'Dea
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 85

Accepted Solution

by:
Scott McDaniel (Microsoft Access MVP - EE MVE ) earned 500 total points
ID: 39847423
That's going to be difficult to do.

You could use something like a Soundex algorithm to "rank" each one compared to the others. Essentially this would give you the greatest chance of duplicates for each entry, and you could then decide what to do with them.

Here's the wikipedia take on the Soundex stuff: http://en.wikipedia.org/wiki/Soundex

Essentially it involves replacing the characters in a string with numeric values, and then comparing the results. There are many different types of these algorithms, for various purposes. One example is this:

Consider the word "Cranston"

You keep the first letter ("C"), and then remove all other vowels, and any occurrence of letters y, h and w, so you're left with this:

Crnstn

You then assign values to the next 3 items. Using the wikipedia method, that would be:

C652

The letter "r" is = 6, the letter "n" is = 5 and the letter "s" = 2.

You'd do the same for all the strings (and you could go out further than 3 letters if you'd prefer), and store this value in a column in that table. You then sort by that column, and you can see immediately which strings are most closely related.

Allen browne has one here: http://allenbrowne.com/vba-Soundex.html. It uses a setup very much like what is described in the wikipedia link.
0
 

Author Closing Comment

by:Patrick O'Dea
ID: 39848501
Thanks, I will experiment.  (I heard about soundex years ago).
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In Part II of this series, I will discuss how to identify all open instances of Excel and enumerate the workbooks, spreadsheets, and named ranges within each of those instances.
Did you know that more than 4 billion data records have been recorded as lost or stolen since 2013? It was a staggering number brought to our attention during last week’s ManageEngine webinar, where attendees received a comprehensive look at the ma…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…
Add bar graphs to Access queries using Unicode block characters. Graphs appear on every record in the color you want. Give life to numbers. Hopes this gives you ideas on visualizing your data in new ways ~ Create a calculated field in a query: …

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question