Checking for duplicate records

If I have an import file that does not have a unique identifier, is there a way to check for duplicates that is pretty bullet-proof?  In other words if an existing record is:

First Name = Bob
Last Name = Jones
Street Address = 123 N 47th Avenue

and the import record is:

First Name = Robert
Last Name = Jones
Street Address = 123 North 47th Avenue

Can we identify the fact that they may be one and the same person?

Who is Participating?
Jim Dettman (Microsoft MVP/ EE MVE)President / OwnerCommented:

 No easy way to do that, but normally, you scrub data by:

1. Trim() the data
2. Concatenate fields.
3. Collapse it (remove all white space, tabs, punctuation, etc.)
4. Compare.
5. replace tokens (like "Robert") with each possible synonym and check again.

Dale FyeCommented:
Agree with Jim that there is no easy to do it, especially with non-matching first names and formats for addresses.  I had a function a while back which would accept two arguments and compare them by first using the Split() function to parse each string into its constituent parts, and would then loop through the two string arrays and do a comparison of the individual parts.  In your example, you would pass it the two values of:

123 N 47th Avenue
123 North 47th Avenue

Since each of those has 4 components, it would make 16 comparisons and return the value 3/4 to indicate that three of the elements in the 1st argument matched with parts of the 2nd argument.  That gives you enough information that you could run a query that looks for last name = last name and this function > .5.

Unfortunately, I don't have access to that function at the moment.  It still would not be enough to ignore a record by itself, but would give you enough information for a form that would let you browse the potential matches and determine how to process those.
There is no bullet-proof process (using your example data) because it is impossible to define rules that always apply. For example, your Bob vs. Robert example could be a father and son, or even 2 different people with similar names. And if one name has Sr. (or Jr.) and another does not, they could still be the same person, except that one account was created differently than the other. Or the 2 different, but similar, accounts could be for different people.

The only bullet-proof way is to use some field that uniquely identifies a person, such as Social Security Number, Driver's License Number, State ID #, etc.

Another piece of data that would help is DOB (Date of Birth). For example, Bob and Robert could be the same person if they have the same DOB. Obviously, any 2 names with different DOBs will be 2 different people, even if they have the exact same name.
SteveL13Author Commented:
Thank to you each of you for your thoughts.  I really appreciate it.  To sum it up... I guess there really is no bullet-proof way like Thomas said.
SteveL13Author Commented:
Thanks to each of you for your input.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.