Excel: Find closest match

I get very large excel files (i.e., over 500K records) with a list of ID's from vendor1 in column A and a list of ID's from vendor2 in column B. Sometimes these values are different between the vendors, but they are usually very similar.

I would like to find the value in column A that is the closest match in column B. The matched value can go in column C.

In other words what is the fastest formula/VBA that can find a value in column A with all the same characters. (ie., E21R = R2E1 = 2 1RE etc.,) If it can't do that, it finds the value in column A with the most similar characters. (ie., E21R = E21 etc.,)


Example Below:

Column A        Column B        Column C (Closest Match)    
E22,2 4TR         T,GRTYE2          TRE GTY 2,
TRE GTY 2,        E4TR22            E22,2 4TR
^3RTY V>
YYTI.OO T

Notice in row 1 that column B&C have the exact same characters just in different order.
Notice in row 2 that there was not an exact match, but column C contains the value with the most similar characters.
ouestqueAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Martin LissOlder than dirtCommented:
I doubt if there's a formula that can do what you want, but the attached workbook has a macro that you may be able to use. With the small sample you gave it finds both of them but it may not with a larger sample. Try it with a thousand rows and see what happens. I suspect that it will be slow if you scale it up to 500,000 rows but I may be able to make it faster. Run the FindMatches macro. Note that the macro creates temporary "helper" columns in F and G.
28711244.xlsm

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
[ fanpages ]IT Services ConsultantCommented:
As I mentioned in a recent thread on the same subject...

[ http://www.experts-exchange.com/questions/28691761/Excel-2010-compare-two-columns-of-data-with-names-not-consistently-entered.html ]

What kind of tolerance of difference between a name in one list, compared to a name in the other list, is to be acceptable?

Two methods of interrogation immediately come to mind that approach the same "fuzzy logic" task in different ways:

Soundex (a Phonetic Algorithm) primarily used just for surnames (& tailored towards the region due to the occurrence of similar surnames in a set of data),

...and...

Levenshtein distance (or, "Edit distance") recording the number of individual characters changes (replacements, removals, or insertions) required to change one value into another value.


Alternatively, certain frequent sub-strings ("Company"/"Co.", "Limited"/"Ltd.", "Incorporated"/"Inc.") could be removed OR replaced for an alias (or aliases) from one or both columns during the comparison.

You will need to determine how similar one value can be to another value before any reasonable attempt can be made to address your requirement.

A sample workbook attached to this thread may also help any contributing Expert, but please obscure any details that may uniquely identify a third party.
Martin LissOlder than dirtCommented:
I'm glad I was able to help.

In my profile you'll find links to some articles I've written that may interest you.
Marty - MVP 2009 to 2015, Experts-Exchange Top Expert Visual Basic Classic 2012 to 2014
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Excel

From novice to tech pro — start learning today.