Solved

similar data

Posted on 2011-09-14
5
173 Views
Last Modified: 2012-08-14
What is the best way to identify similar data when comparing 2 columns.

I have one sheet of data that has been matched together from 2 seperate databases on postal/zip code of businesses. So where the zip code matches I have address 1 and name from two databases joined on where zip code matched, and then input into excel.

There are two columns per database I need to match to get a true match but there are some typos in one database that means I am not getting true matches.

I have been doing IF statements where if (column A - which was the output of DB A businesses addresses matches exactly to column C - which was the output of DB B business addressed, then type "match" in column E).

Also in column F I do an IF statement for columns B and D for exact matches, this time it is business address line 1 - so if they match exact then "match", if not then "not".

However when reviewing the nots I see some minor typos in database B which means they are matches, but a character is slightly wrong, or theres a comma added by mistake.

How can I identify "likely to be matched with typos"?

Is there anyway to say "1 character differs", or "2 characters differ" or any other ideas?
0
Comment
Question by:pma111
  • 3
  • 2
5 Comments
 
LVL 10

Accepted Solution

by:
ukerandi earned 500 total points
ID: 36535244
first you need to write following kind of Formula to remove ","
=IF(FIND(",",A1)>=1,REPLACE(A1,FIND(",",A1),1," "))
then after that write match if condition
0
 
LVL 3

Author Comment

by:pma111
ID: 36535258
Its not just commas thought that's just one of many cases other times it can be one or two characters wrong
0
 
LVL 10

Expert Comment

by:ukerandi
ID: 36536911
can you send me some example data
0
 
LVL 10

Expert Comment

by:ukerandi
ID: 36536921
if you can give some examples i can create formulae
0
 
LVL 3

Author Comment

by:pma111
ID: 36548019
Sorry its commercially sensitive
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A little background as to how I came to I design this code: Around 5 years ago I designed an add-in that formatted Excel files to a corporate standard, applying different cell colours and font type depending on whether the cells contained inputs,…
This code takes an Excel list of URL’s and adds a header titled “URL List”. It then searches through all URL’s in column “A”, looking for duplicates. When a duplicate is found, it is moved to the top of the list. The duplicate URL’s are then highlig…
The viewer will learn how to use a discrete random variable to simulate the return on an investment over a period of years, create a Monte Carlo simulation using the discrete random variable, and create a graph to represent the possible returns over…
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…

895 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now