Solved

Find a match and return a value from 2 similar text strings

Posted on 2014-01-31
5
1,682 Views
Last Modified: 2014-02-04
I am looking for a method to match 2 similar text strings (company names) from two lists and return a value (an ID number).  I've tried using the Lookup function but it returned too many errors when the "False" (close match) attribute was applied and too few results when the "True" (exact match) attribute was applied.  

Does anyone know a good way to accomplish a match like this?

Attached are sample lists.  For example purposes, I am trying to look up the Customer name in COL A of Sheet1 from the list in COL A of Sheet (2).  I want to return the MemberID from COL B in Sheet2 and put it in COL C of Sheet1.

Thanks in advance for your help.
Lookup-MemberID-from-Customer-Na.xlsx
0
Comment
Question by:thutchinson
5 Comments
 
LVL 35

Expert Comment

by:mvidas
Comment Utility
thutchinson,

Insert a module into your workbook, and paste in the following code. It uses a form of a fuzzy match by al_b_cnu, source cited at the top of the module. I got the best results for you using algorithm=2. To use, put the following in C2:
=FuzzyVLookup($A2,Sheet2!$A$2:$B$127,2,,,2)

It had the following differences from your desired code:
It matched "MODESTO JANITORIAL SUPPLY CTR" with "ProClean Supply/Modesto Janitorial". You had a blank match.
"LANSING SANITARY SUPPLY" returned with 402, instead of 408 (your error)
"PIKE SYSTEMS INC" and "PITT CHEMICAL & SANITARY" were switched on your desired results, the function returns the correct ones.

The only bad match is that it matched "CALIFORNIA JANITORIAL SUPPLY" with "Abe Janitorial Supply & Equipment" instead of blank as you had. I tried fine tuning the percentage variable to return a blank one, but the threshold I had to do caused too many other issues that it was not worth it.

You could spot check the results using
=FuzzyVLookup($A2,Sheet2!$A$2:$B$127,1,,,2)
to ensure a good match was made.

Option Explicit
Type RankInfo
    Offset As Integer
    Percentage As Single
End Type
'Found at http://www.mrexcel.com/forum/excel-questions/64987-comparing-two-columns.html#post307060
Function FuzzyPercent(ByVal String1 As String, _
                      ByVal String2 As String, _
                      Optional Algoritm As Integer = 3) As Single
'*************************************
'** Return a % match on two strings **
'*************************************
Dim intLen1 As Integer
Dim intCurLen As Integer
Dim intTo As Integer
Dim intPos As Integer
Dim intPtr As Integer
Dim intScore As Integer
Dim intTotScore As Integer
Dim intStartPos As Integer
Dim StrWork As String
Dim Str1 As String
Dim Str2 As String

'---------------------------------------------------
'-- Remove surrounding spaces & ensure lower case --
'---------------------------------------------------
Str1 = LCase$(Trim$(String1))
Str2 = LCase$(Trim$(String2))

'----------------------------------------------
'-- Give 100% match if strings exactly equal --
'----------------------------------------------
If Str1 = Str2 Then
    FuzzyPercent = 1
    Exit Function
End If

intLen1 = Len(Str1)

'----------------------------------------
'-- Give 0% match if string length < 2 --
'----------------------------------------
If intLen1 < 2 Then
    FuzzyPercent = 0
    Exit Function
End If

intTotScore = 0                   'initialise total possible score
intScore = 0                      'initialise current score

'--------------------------------------------------------
'-- If Algoritm = 1 or 3, Search for single characters --
'--------------------------------------------------------
If (Algoritm And 1) <> 0 Then
    intTotScore = intLen1                   'initialise total possible score
    intPos = 0
    For intPtr = 1 To intLen1
        intStartPos = intPos + 1
        intPos = InStr(intStartPos, Str2, Mid$(Str1, intPtr, 1))
        If intPos > 0 Then
            If intPos > intStartPos + 3 Then     'No match if char is > 3 bytes away
                intPos = intStartPos
            Else
                intScore = intScore + 1          'Update current score
            End If
        Else
            intPos = intStartPos
        End If
    Next intPtr
End If

'-----------------------------------------------------------
'-- If Algoritm = 2 or 3, Search for pairs, triplets etc. --
'-----------------------------------------------------------
If (Algoritm And 2) <> 0 Then
    For intCurLen = 2 To intLen1
        StrWork = Str2                          'Get a copy of String2
        intTo = intLen1 - intCurLen + 1
        intTotScore = intTotScore + Int(intLen1 / intCurLen)  'Update total possible score
        For intPtr = 1 To intTo Step intCurLen
            intPos = InStr(StrWork, Mid$(Str1, intPtr, intCurLen))
            If intPos > 0 Then
                Mid$(StrWork, intPos, intCurLen) = String$(intCurLen, &H0) 'corrupt found string
                intScore = intScore + 1     'Update current score
            End If
        Next intPtr
    Next intCurLen
End If

FuzzyPercent = intScore / intTotScore

End Function
Function FuzzyVLookup(ByVal LookupValue As String, _
                      ByVal TableArray As Range, _
                      ByVal IndexNum As Integer, _
                      Optional NFPercent As Single = 0.05, _
                      Optional Rank As Integer = 1, _
                      Optional Algoritm As Integer = 3) As Variant
'********************************************************************************
'** Function to Fuzzy match LookupValue with entries in                        **
'** column 1 of table specified by TableArray.                                 **
'** TableArray must specify the top left cell of the range to be searched      **
'** The function stops scanning the table when an empty cell in column 1       **
'** is found.                                                                  **
'** For each entry in column 1 of the table, FuzzyPercent is called to match   **
'** LookupValue with the Table entry.                                          **
'** 'Rank' is an optional parameter which may take any value > 0               **
'**        (default 1) and causes the function to return the 'nth' best        **
'**         match (where 'n' is defined by 'Rank' parameter)                   **
'** If the 'Rank' match percentage < NFPercent (Default 5%), #N/A is returned. **
'** IndexNum is the column number of the entry in TableArray required to be    **
'** returned, as follows:                                                      **
'** If IndexNum > 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the column entry indicated by IndexNum is     **
'**                 returned.                                                  **
'** if IndexNum = 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the offset row (starting at 1) is returned.   **
'**                 This value can be used directly in the 'Index' function.   **
'**                                                                            **
'** Algoritm can take one of the following values:                             **
'** Algoritm = 1:                                                              **
'**     For each character in 'String1', a search is performed on 'String2'.   **
'**     The search is deemed successful if a character is found in 'String2'   **
'**     within 3 characters of the current position.                           **
'**     A score is kept of matching characters which is returned as a          **
'**     percentage of the total possible score.                                **
'** Algoritm = 2:                                                              **
'**     A count of matching pairs, triplets, quadruplets etc. in 'String1' and **
'**     'String2' is returned as a percentage of the total possible.           **
'** Algoritm = 3: Both Algoritms 1 and 2 are performed.                        **
'********************************************************************************
Dim strLookupValue As String
Dim strListString As String
Dim StrWork As String
Dim sngMinPercent As Single
Dim sngWork As Single
Dim sngCurPercent As Single
Dim intBestMatchPtr As Integer
Dim intPtr As Integer
Dim intRankPtr As Integer
Dim intRankPtr1 As Integer

Dim udRankData() As RankInfo
'--------------------------------------------------------------
'--    Validation                                            --
'--------------------------------------------------------------
strLookupValue = Trim$(LCase$(LookupValue))

If IsMissing(NFPercent) Then
    sngMinPercent = 0.05
Else
    If (NFPercent <= 0) Or (NFPercent > 1) Then
        FuzzyVLookup = "*** 'NFPercent' must be a percentage > zero ***"
        Exit Function
    End If
    sngMinPercent = NFPercent
End If

If Rank < 1 Then
    FuzzyVLookup = "*** 'Rank' must be an integer > 0 ***"
    Exit Function
End If

ReDim udRankData(1 To Rank)

'---------------
'-- Main loop --
'---------------
intPtr = 1
Do While VarType(TableArray.Cells(intPtr, 1)) <> vbEmpty
    If VarType(TableArray.Cells(intPtr, 1)) = vbString Then
        strListString = Trim$(LCase$(TableArray.Cells(intPtr, 1)))
        
        '------------------------------------------------
        '-- Fuzzy match strings & get percentage match --
        '------------------------------------------------
        sngCurPercent = FuzzyPercent(String1:=strLookupValue, _
                                     String2:=strListString, _
                                     Algoritm:=Algoritm)
        
        If sngCurPercent >= sngMinPercent Then
            '---------------------------
            '-- Store in ranked array --
            '---------------------------
            For intRankPtr = 1 To Rank
                If sngCurPercent > udRankData(intRankPtr).Percentage Then
                    For intRankPtr1 = Rank To intRankPtr + 1 Step -1
                        With udRankData(intRankPtr1)
                            .Offset = udRankData(intRankPtr1 - 1).Offset
                            .Percentage = udRankData(intRankPtr1 - 1).Percentage
                        End With
                    Next intRankPtr1
                    With udRankData(intRankPtr)
                        .Offset = intPtr
                        .Percentage = sngCurPercent
                    End With
                    Exit For
                End If
            Next intRankPtr
        End If
        
    End If
    intPtr = intPtr + 1
Loop

If udRankData(Rank).Percentage < sngMinPercent Then
    '--------------------------------------
    '-- Return '#N/A' if below NFPercent --
    '--------------------------------------
    FuzzyVLookup = CVErr(xlErrNA)
Else
    intBestMatchPtr = udRankData(Rank).Offset
    If IndexNum > 0 Then
        '-----------------------------------
        '-- Return column entry specified --
        '-----------------------------------
        FuzzyVLookup = TableArray.Cells(intBestMatchPtr, IndexNum)
    Else
        '-----------------------
        '-- Return offset row --
        '-----------------------
        FuzzyVLookup = intBestMatchPtr
    End If
End If
End Function
Function FuzzyHLookup(ByVal LookupValue As String, _
                      ByVal TableArray As Range, _
                      ByVal IndexNum As Integer, _
                      Optional NFPercent As Single = 0.05, _
                      Optional Rank As Integer = 1, _
                      Optional Algoritm As Integer = 3) As Variant
'********************************************************************************
'** Function to Fuzzy match LookupValue with entries in                        **
'** row 1 of table specified by TableArray.                                    **
'** TableArray must specify the top left cell of the range to be searched      **
'** The function stops scanning the table when an empty cell in row 1          **
'** is found.                                                                  **
'** For each entry in row 1 of the table, FuzzyPercent is called to match      **
'** LookupValue with the Table entry.                                          **
'** 'Rank' is an optional parameter which may take any value > 0               **
'**        (default 1) and causes the function to return the 'nth' best        **
'**         match (where 'n' is defined by 'Rank' parameter)                   **
'** If the 'Rank' match percentage < NFPercent (Default 5%), #N/A is returned. **
'** IndexNum is the row number of the entry in TableArray required to be       **
'** returned, as follows:                                                      **
'** If IndexNum > 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the row entry indicated by IndexNum is        **
'**                 returned.                                                  **
'** if IndexNum = 0 and the 'Rank' percentage match is >= NFPercent            **
'**                 (Default 5%) the offset col (starting at 0) is returned.   **
'**                 This value can be used directly in the 'OffSet' function.  **
'**                                                                            **
'** Algoritm can take one of the following values:                             **
'** Algoritm = 1:                                                              **
'**     For each character in 'String1', a search is performed on 'String2'.   **
'**     The search is deemed successful if a character is found in 'String2'   **
'**     within 3 characters of the current position.                           **
'**     A score is kept of matching characters which is returned as a          **
'**     percentage of the total possible score.                                **
'** Algoritm = 2:                                                              **
'**     A count of matching pairs, triplets, quadruplets etc. in 'String1' and **
'**     'String2' is returned as a percentage of the total possible.           **
'** Algoritm = 3: Both Algoritms 1 and 2 are performed.                        **

'********************************************************************************
Dim strLookupValue As String
Dim strListString As String
Dim StrWork As String

Dim sngMinPercent As Single
Dim sngWork As Single
Dim sngCurPercent As Single
Dim intBestMatchPtr As Integer
Dim intPtr As Integer
Dim intRankPtr As Integer
Dim intRankPtr1 As Integer

Dim udRankData() As RankInfo
'--------------------------------------------------------------
'--    Validation                                            --
'--------------------------------------------------------------
strLookupValue = Trim$(LCase$(LookupValue))

If IsMissing(NFPercent) Then
    sngMinPercent = 0.05
Else
    If (NFPercent <= 0) Or (NFPercent > 1) Then
        FuzzyHLookup = "*** 'NFPercent' must be a percentage > zero ***"
        Exit Function
    End If
    sngMinPercent = NFPercent
End If

If Rank < 1 Then
    FuzzyHLookup = "*** 'Rank' must be an integer > 0 ***"
    Exit Function
End If

ReDim udRankData(1 To Rank)

'---------------
'-- Main loop --
'---------------
intPtr = 1
Do While VarType(TableArray.Cells(1, intPtr)) <> vbEmpty
    If VarType(TableArray.Cells(1, intPtr)) = vbString Then
        strListString = Trim$(LCase$(TableArray.Cells(1, intPtr)))
        
'        If Rank = 1 Then
'            If strLookupValue = strListString Then
'                '-- 100% match ! --
'                If IndexNum > 0 Then
'                    FuzzyHLookup = TableArray.Cells(IndexNum, intPtr)
'                Else
'                    FuzzyHLookup = intPtr
'                End If
'                Exit Function
'            End If
'        End If
        
        '------------------------------------------------
        '-- Fuzzy match strings & get percentage match --
        '------------------------------------------------
        sngCurPercent = FuzzyPercent(String1:=strLookupValue, _
                                     String2:=strListString, _
                                     Algoritm:=Algoritm)
        
        If sngCurPercent >= sngMinPercent Then
            '---------------------------
            '-- Store in ranked array --
            '---------------------------
            For intRankPtr = 1 To Rank
                If sngCurPercent > udRankData(intRankPtr).Percentage Then
                    For intRankPtr1 = Rank To intRankPtr + 1 Step -1
                        With udRankData(intRankPtr1)
                            .Offset = udRankData(intRankPtr1 - 1).Offset
                            .Percentage = udRankData(intRankPtr1 - 1).Percentage
                        End With
                    Next intRankPtr1
                    With udRankData(intRankPtr)
                        .Offset = intPtr
                        .Percentage = sngCurPercent
                    End With
                    Exit For
                End If
            Next intRankPtr
        End If
        
    End If
    intPtr = intPtr + 1
Loop

If udRankData(Rank).Percentage < sngMinPercent Then
    '--------------------------------------
    '-- Return '#N/A' if below NFPercent --
    '--------------------------------------
    FuzzyHLookup = CVErr(xlErrNA)
Else
    intBestMatchPtr = udRankData(Rank).Offset
    If IndexNum > 0 Then
        '-----------------------------------
        '-- Return row entry specified --
        '-----------------------------------
        FuzzyHLookup = TableArray.Cells(IndexNum, intBestMatchPtr)
    Else
        '-----------------------
        '-- Return offset col --
        '-----------------------
        FuzzyHLookup = intBestMatchPtr
    End If
End If
End Function

Open in new window

Matt
0
 
LVL 45

Expert Comment

by:Martin Liss
Comment Utility
Here's a start. Run the CompareValues macro. It calls the SoundEx function written by a guy named Allen Brown to return a 4-character string based on how the company name is pronounced. As written, my CompareVales macro will display a list of exact and/or close matches in the Immediate window. If you think it might be useful let me know and perhaps we can come up with a way to refine it for your use.

Sub CompareValues()

    Dim lngRow1 As Long
    Dim lngRow2 As Long
    Dim strSoundEx As String
    
    For lngRow1 = 2 To Worksheets("Sheet1").UsedRange.Rows.Count
        strSoundEx = Soundex(Worksheets("Sheet1").Cells(lngRow1, 1))
        For lngRow2 = 2 To Worksheets("Sheet2").UsedRange.Rows.Count
            If strSoundEx = Soundex(Worksheets("Sheet2").Cells(lngRow2, 1)) Then
                Debug.Print Worksheets("Sheet1").Cells(lngRow1, 1) & " matches " & Worksheets("Sheet2").Cells(lngRow2, 1)
            End If
        Next
    Next
End Sub
Public Function Soundex(varText As Variant) As Variant
On Error GoTo Err_Handler
    'Purpose:   Return Soundex value for the text passed in.
    'Return:    Soundex code, or Null for Error, Null or zero-length string.
    'Argument:  The value to generate the Soundex for.
    'Author:    Allen Browne (allen@allenbrowne.com), November 2007.
    'Algorithm: Based on http://en.wikipedia.org/wiki/Soundex
    Dim strSource As String     'varText as a string.
    Dim strOut As String        'Output string to build up.
    Dim strValue As String      'Value for current character.
    Dim strPriorValue As String 'Value for previous character.
    Dim lngPos As Long          'Position in source string
    
    'Do not process Error, Null, or zero-length strings.
    If Not IsError(varText) Then
        'strSource = Trim$(Nz(varText, vbNullString))
        strSource = UCase(Trim$(varText))
        If strSource <> vbNullString Then
            'Retain the initial character, and process from 2nd.
            strOut = Left$(strSource, 1&)
            strPriorValue = SoundexValue(strOut)
            lngPos = 2&
            
            'Examine a character at a time, until we output 4 characters.
            Do
                strValue = SoundexValue(Mid$(strSource, lngPos, 1&))
                'Omit repeating values (except the zero for padding.)
                If ((strValue <> strPriorValue) And (strValue <> vbNullString)) Or (strValue = "0") Then
                    strOut = strOut & strValue
                    strPriorValue = strValue
                End If
                lngPos = lngPos + 1&
            Loop Until Len(strOut) >= 4&
        End If
    End If
    
    'Return the output string, or Null if nothing generated.
    If strOut <> vbNullString Then
        Soundex = strOut
    Else
        Soundex = Null
    End If
    
Exit_Handler:
    Exit Function
    
Err_Handler:
    MsgBox "Error " & Err.Number & ": " & Err.Description, vbExclamation, "Soundex()"
    'Call LogError(Err.Number, Err.Description, conMod & ".Soundex")
    Resume Exit_Handler
End Function
Private Function SoundexValue(strChar As String) As String
    Select Case strChar
    Case "B", "F", "P", "V"
        SoundexValue = "1"
    Case "C", "G", "J", "K", "Q", "S", "X", "Z"
        SoundexValue = "2"
    Case "D", "T"
        SoundexValue = "3"
    Case "L"
        SoundexValue = "4"
    Case "M", "N"
        SoundexValue = "5"
    Case "R"
        SoundexValue = "6"
    Case vbNullString
        'Pad trailing zeros if no more characters.
        SoundexValue = "0"
    Case Else
        'Return nothing for "A", "E", "H", "I", "O", "U", "W", "Y", non-alpha.
    End Select
End Function

Open in new window

0
 
LVL 23

Assisted Solution

by:NBVC
NBVC earned 250 total points
Comment Utility
Another option could be to look at maybe just the left 10 characters for match...

use formula:

=IFERROR(INDEX(Sheet2!$B$2:$B$127,MATCH(LEFT(A2,10)&"*",member,0)),IFERROR(LOOKUP(10^10,SEARCH(LEFT(member,10),LEFT(A2,10)),Sheet2!$B$2:$B$127),""))

With this formula it missed the ACCOMODATION MOLLEN INC  A0076 because of difference in spelling on the 2 sheets and it missed NORCO, INC (GASES PLUS)  because the first 10 chars didn't match... but I guess these can be spot checked.  

It would be difficult with any method to get 100% matching...
0
 
LVL 80

Accepted Solution

by:
byundt earned 250 total points
Comment Utility
Depending on the version of Excel that you use, Microsoft has a very powerful fuzzy lookup add-in available. It requires Excel 2010 or later.
http://www.microsoft.com/en-us/download/details.aspx?id=15011 "Fuzzy Lookup Add-In for Excel"

The add-in requires that your data be in Tables. Microsoft Excel MVP Debra Dalgleish describes how to use the add-in in http://blog.contextures.com/archives/2011/05/06/fuzzy-lookup-add-in-for-excel-2010/ "Fuzzy lookup add-in for Excel 2010"
0
 

Author Comment

by:thutchinson
Comment Utility
Hi experts,

Thanks for your responses.  There's great stuff here and I want to test all the solutions.  Unfortunately, I only had time so far to test NBVC's Index/Match function and byundt's fuzzy logic add-in for Excel.  These both worked as well as I could expect.

Thanks all.
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Introduction This Article is a follow-up to my Mappit! Addin Article (http://www.experts-exchange.com/A_2613.html), it was inspired by an email posting I made to EUSPRIG (http://www.eusprig.org/index.htm), I will briefly cover: 1) An overvie…
This code takes an Excel list of URL’s and adds a header titled “URL List”. It then searches through all URL’s in column “A”, looking for duplicates. When a duplicate is found, it is moved to the top of the list. The duplicate URL’s are then highlig…
The viewer will learn how to create two correlated normally distributed random variables in Excel, use a normal distribution to simulate the return on different levels of investment in each of the two funds over a period of ten years, and, create a …
This Micro Tutorial will demonstrate how to use longer labels with horizontal bar charts instead of the vertical column chart.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now