• Status: Solved
• Priority: Medium
• Security: Public
• Views: 6932

# percent matching of string 1 against string 2 in excel

I have two columns in excel that I'm trying to figure out how closely the two columns match with each other.

Lets say I have a column of strings (column A), and another column (column B) that I want to match against. I want to take each cell in column A, and get a "percent match" in all the cells in B so I can find the closest match in column B.

Also, the stings would be something "Cell reconfiguration engineering AA145-78XR" against "Setup AA145-78XR" which should be a close match.

Any ideas?
0
k1ng87
1 Solution

Commented:
This could get messy....first of all you need to split the cell contents in column A to form an array and then check for the presence of each array item in the cell contents in column B. This assumes that your comparison process will treat blocks of characters (delimited by a space or other character e.g. a hyphen). So in your example above, the cell from column A would have 5 strings in it's array. When searching column B you would need to check for the presence of each of these within each column B cell. If you find a match you would increment a counter, for example. This approach is at a very basic level and could be made more complex by looking for strings in a particular sequence, or for strings that are adjacent to each other. Again taking your example, finding 'reconciliation' and then 'engineering' in the same cell in column B might increment your counter by 1 on each occasion, but if the two strings are adjacent to each other that count might be higher (stronger match) or if the two strings appear in the same order as per the cell from column A, again the counter might be higher (again a stronger match). For example...

A1='Cell reconfiguration engineering AA145-78XR'
B1='Setup AA145-78XR' (score 1 for finding AA145, score 1 for finding 78XR, score 1 because they appear in the same order as A1, and score 1 because they are adjacent to each other. Total 'matching score' = 4.

Conversely...if B1 contained '78XR whatever AA145' the score might be only 2, 1 point
for finding AA145 and 1 point for finding 78XR. No points for them being adjacent and no points for them appearing in the same order.

Now for the hard bit....you need to code this up in VBA! I'm a bit rusty in that department but no doubt someone else can rattle off that code in a few mins.
0

Commented:
What you are requesting is called "fuzzy matching". If you have Excel 2010 or later, Microsoft has an add-in for that purpose. http://www.microsoft.com/en-us/download/details.aspx?id=15011 "Fuzzy Lookup Add-In for Excel"

If you have an earlier version of Excel, then consider the code suggested by al_b_cnu in http://www.mrexcel.com/forum/excel-questions/195635-fuzzy-matching-new-version-plus-explanation.html#post955137
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.