Pau Lo
asked on
similar data
What is the best way to identify similar data when comparing 2 columns.
I have one sheet of data that has been matched together from 2 seperate databases on postal/zip code of businesses. So where the zip code matches I have address 1 and name from two databases joined on where zip code matched, and then input into excel.
There are two columns per database I need to match to get a true match but there are some typos in one database that means I am not getting true matches.
I have been doing IF statements where if (column A - which was the output of DB A businesses addresses matches exactly to column C - which was the output of DB B business addressed, then type "match" in column E).
Also in column F I do an IF statement for columns B and D for exact matches, this time it is business address line 1 - so if they match exact then "match", if not then "not".
However when reviewing the nots I see some minor typos in database B which means they are matches, but a character is slightly wrong, or theres a comma added by mistake.
How can I identify "likely to be matched with typos"?
Is there anyway to say "1 character differs", or "2 characters differ" or any other ideas?
I have one sheet of data that has been matched together from 2 seperate databases on postal/zip code of businesses. So where the zip code matches I have address 1 and name from two databases joined on where zip code matched, and then input into excel.
There are two columns per database I need to match to get a true match but there are some typos in one database that means I am not getting true matches.
I have been doing IF statements where if (column A - which was the output of DB A businesses addresses matches exactly to column C - which was the output of DB B business addressed, then type "match" in column E).
Also in column F I do an IF statement for columns B and D for exact matches, this time it is business address line 1 - so if they match exact then "match", if not then "not".
However when reviewing the nots I see some minor typos in database B which means they are matches, but a character is slightly wrong, or theres a comma added by mistake.
How can I identify "likely to be matched with typos"?
Is there anyway to say "1 character differs", or "2 characters differ" or any other ideas?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
can you send me some example data
if you can give some examples i can create formulae
ASKER
Sorry its commercially sensitive
ASKER