Solved

percent matching of string 1 against string 2 in excel

Posted on 2013-11-11
2
3,383 Views
Last Modified: 2013-11-16
I have two columns in excel that I'm trying to figure out how closely the two columns match with each other.

Lets say I have a column of strings (column A), and another column (column B) that I want to match against. I want to take each cell in column A, and get a "percent match" in all the cells in B so I can find the closest match in column B.

Also, the stings would be something "Cell reconfiguration engineering AA145-78XR" against "Setup AA145-78XR" which should be a close match.

Any ideas?
0
Comment
Question by:k1ng87
2 Comments
 
LVL 9

Expert Comment

by:guswebb
ID: 39640323
This could get messy....first of all you need to split the cell contents in column A to form an array and then check for the presence of each array item in the cell contents in column B. This assumes that your comparison process will treat blocks of characters (delimited by a space or other character e.g. a hyphen). So in your example above, the cell from column A would have 5 strings in it's array. When searching column B you would need to check for the presence of each of these within each column B cell. If you find a match you would increment a counter, for example. This approach is at a very basic level and could be made more complex by looking for strings in a particular sequence, or for strings that are adjacent to each other. Again taking your example, finding 'reconciliation' and then 'engineering' in the same cell in column B might increment your counter by 1 on each occasion, but if the two strings are adjacent to each other that count might be higher (stronger match) or if the two strings appear in the same order as per the cell from column A, again the counter might be higher (again a stronger match). For example...

A1='Cell reconfiguration engineering AA145-78XR'
B1='Setup AA145-78XR' (score 1 for finding AA145, score 1 for finding 78XR, score 1 because they appear in the same order as A1, and score 1 because they are adjacent to each other. Total 'matching score' = 4.

Conversely...if B1 contained '78XR whatever AA145' the score might be only 2, 1 point
for finding AA145 and 1 point for finding 78XR. No points for them being adjacent and no points for them appearing in the same order.

Now for the hard bit....you need to code this up in VBA! I'm a bit rusty in that department but no doubt someone else can rattle off that code in a few mins.
0
 
LVL 81

Accepted Solution

by:
byundt earned 500 total points
ID: 39640515
What you are requesting is called "fuzzy matching". If you have Excel 2010 or later, Microsoft has an add-in for that purpose. http://www.microsoft.com/en-us/download/details.aspx?id=15011 "Fuzzy Lookup Add-In for Excel"

If you have an earlier version of Excel, then consider the code suggested by al_b_cnu in http://www.mrexcel.com/forum/excel-questions/195635-fuzzy-matching-new-version-plus-explanation.html#post955137
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Convert between Excel file formats (.XLS, .XLSX, .XLSM) with/without macro option David Miller (dlmille) Intro Over this past Fall, I've had the opportunity to see several similar requests and have developed a couple related solutions associate…
Improved? Move/Copy Add-in Replacement - How to avoid the annoying, “A formula or sheet you want to move or copy contains the name XXX, which already exists on the destination worksheet.” David Miller (dlmille)  It was one of those days… I wa…
The viewer will learn how to simulate a series of sales calls dependent on a single skill level and learn how to simulate a series of sales calls dependent on two skill levels. Simulating Independent Sales Calls: Enter .75 into cell C2 – “skill leve…
Graphs within dashboards are meant to be dynamic, representing data from a period of time that will change each time the dashboard is updated with new data. Rather than update each graph to point to a different set within a static set of data, t…

912 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now