Posted on 2011-09-14
What is the best way to identify similar data when comparing 2 columns.
I have one sheet of data that has been matched together from 2 seperate databases on postal/zip code of businesses. So where the zip code matches I have address 1 and name from two databases joined on where zip code matched, and then input into excel.
There are two columns per database I need to match to get a true match but there are some typos in one database that means I am not getting true matches.
I have been doing IF statements where if (column A - which was the output of DB A businesses addresses matches exactly to column C - which was the output of DB B business addressed, then type "match" in column E).
Also in column F I do an IF statement for columns B and D for exact matches, this time it is business address line 1 - so if they match exact then "match", if not then "not".
However when reviewing the nots I see some minor typos in database B which means they are matches, but a character is slightly wrong, or theres a comma added by mistake.
How can I identify "likely to be matched with typos"?
Is there anyway to say "1 character differs", or "2 characters differ" or any other ideas?