Checking for duplicate records

If I have an import file that does not have a unique identifier, is there a way to check for duplicates that is pretty bullet-proof?  In other words if an existing record is:

First Name = Bob
Last Name = Jones
Street Address = 123 N 47th Avenue

and the import record is:

First Name = Robert
Last Name = Jones
Street Address = 123 North 47th Avenue

Can we identify the fact that they may be one and the same person?

Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Jim Dettman (Microsoft MVP/ EE MVE)President / OwnerCommented:

 No easy way to do that, but normally, you scrub data by:

1. Trim() the data
2. Concatenate fields.
3. Collapse it (remove all white space, tabs, punctuation, etc.)
4. Compare.
5. replace tokens (like "Robert") with each possible synonym and check again.


Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Dale FyeOwner, Developing Solutions LLCCommented:
Agree with Jim that there is no easy to do it, especially with non-matching first names and formats for addresses.  I had a function a while back which would accept two arguments and compare them by first using the Split() function to parse each string into its constituent parts, and would then loop through the two string arrays and do a comparison of the individual parts.  In your example, you would pass it the two values of:

123 N 47th Avenue
123 North 47th Avenue

Since each of those has 4 components, it would make 16 comparisons and return the value 3/4 to indicate that three of the elements in the 1st argument matched with parts of the 2nd argument.  That gives you enough information that you could run a query that looks for last name = last name and this function > .5.

Unfortunately, I don't have access to that function at the moment.  It still would not be enough to ignore a record by itself, but would give you enough information for a form that would let you browse the potential matches and determine how to process those.
There is no bullet-proof process (using your example data) because it is impossible to define rules that always apply. For example, your Bob vs. Robert example could be a father and son, or even 2 different people with similar names. And if one name has Sr. (or Jr.) and another does not, they could still be the same person, except that one account was created differently than the other. Or the 2 different, but similar, accounts could be for different people.

The only bullet-proof way is to use some field that uniquely identifies a person, such as Social Security Number, Driver's License Number, State ID #, etc.

Another piece of data that would help is DOB (Date of Birth). For example, Bob and Robert could be the same person if they have the same DOB. Obviously, any 2 names with different DOBs will be 2 different people, even if they have the exact same name.
SteveL13Author Commented:
Thank to you each of you for your thoughts.  I really appreciate it.  To sum it up... I guess there really is no bullet-proof way like Thomas said.
SteveL13Author Commented:
Thanks to each of you for your input.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Access

From novice to tech pro — start learning today.