mmj1
asked on
Excel 2010 - compare two columns of data with names not consistently entered
I am trying to find the quickest way to complete the following task:
I am using Excel 2010 - the spreadsheet I am working on has two columns of information (names).
Column A has 200 names listed and Column B has 1500 names listed.
I am trying to find out what names listed in Column A are in Column B -- the issue is the names are not consistently entered in both columns.
So - Column A may show ABC Computer Company and column B may list the name as ABC Computer Co Inc.
Any input would be appreciated.
Thanks
I am using Excel 2010 - the spreadsheet I am working on has two columns of information (names).
Column A has 200 names listed and Column B has 1500 names listed.
I am trying to find out what names listed in Column A are in Column B -- the issue is the names are not consistently entered in both columns.
So - Column A may show ABC Computer Company and column B may list the name as ABC Computer Co Inc.
Any input would be appreciated.
Thanks
ASKER
Unfortunately I am unable to provide a sample of the list. I suppose I was looking for comparison more on the "key" words for example - if the company name is "Sampson Brothers Lumber Co" in Column A and "Sampson Brothers Lumber Co. Inc." in column B - I was more looking to partially match on the key word "Sampson" or "Sampson Brothers" - I know this may not be possible.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
I really appreciate your assistance in trying to assist me with my question. As I am working on some other projects at this time I am going to have to put this one on the back burner and didn't want to leave the question abandoned. Thank you again for trying to assist me.
Two methods of interrogation immediately come to mind that approach the same "fuzzy logic" task in different ways:
Soundex (a Phonetic Algorithm) primarily used just for surnames (& tailored towards the region due to the occurrence of similar surnames in a set of data),
...and...
Levenshtein distance (or, "Edit distance") recording the number of individual characters changes (replacements, removals, or insertions) required to change one value into another value.
Alternatively, certain frequent sub-strings ("Company"/"Co.", "Limited"/"Ltd.", "Incorporated"/"Inc.") could be removed OR replaced for an alias (or aliases) from one or both columns during the comparison.
You will need to determine how similar one value can be to another value before any reasonable attempt can be made to address your requirement.
A sample workbook attached to this thread may also help any contributing Expert, but please obscure any details that may uniquely identify a third party.