• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1368
  • Last Modified:

Regex Match Name with List Names-Excel VB

Hi,

I'm just trying to solve how to use a regex pattern in vb to match a list of names in sheet 1 (Range A1:A5) with another list of names in sheet 2 (Range A1:A8) and if there is a matching name found from sheet 2 and enter this name to the corresponding matching name in sheet 1.

In sheet 1 I have:
Ford Hill
Jeremy Camp
Alyssa Cornell
Joe Smith
Andy Carmell

Then in sheet 2 I have:
Ford/Miranda Hill
Jeremy Kamp
Alyssa Cornel
Joe Smith
Andy Carmel
Anita Chow
John Goldstein
Rob Kim

The spelling is a bit different in sheet 2 by some characters or one.

I'm attaching an excel sheet. Any help is very appreciated.

Thank you,
vkimura
regex-match-name-with-list-names.xls
0
Victor Kimura
Asked:
Victor Kimura
  • 3
  • 2
  • 2
2 Solutions
 
RichardSchollarCommented:
Hi

I don't think regex will help you much here.  What you want is either some kind of Soundex algorithm (which you could program into VBA) or even better is the following "fuzzy Vlookup" function which was written by a member of the MrExcel.com forum.  Check out his "fuzzy vlookup" function here:

http://www.mrexcel.com/forum/showthread.php?t=195635

Richard
0
 
Patrick MatthewsCommented:
Richard,

Thanks so much for posting that link!  I actually tried a Soundex approach, but I was dissatisfied with the
result so I didn't post it.  I'll have to play with the "fuzzy" functions :)

Regards,

Patrick
0
 
RichardSchollarCommented:
Hey Patrick

You're welcome!  it is resource-hungry though :)

Richard
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
Victor KimuraAuthor Commented:
RichardSchollar,

How about if I had something more simpler such as a variance of one letter at the end of the first name and/or last name. For instance,
Jack Lemmon - Jacks Lemmons
or
Jack Lemmon - Jacks Lemmon
or
Jack Lemmons - Jack Lemmons

where left - right become a true.

I think the fuzzy logic is too much and for the ones that don't match then I can simply view them and edit them myself since the lists will be fairly short. Most names are not significantly different

I have a dual purpose to view the code: one so I can learn vb and the other is to use it for the office. =)

Thank you for the reference to the fuzzy logic.

Could you provide me with some simple code as in the above?

Cordially,
vkimura
0
 
Patrick MatthewsCommented:
vkimura2007,

The attached workbook uses the fuzzy logic search that Richard linked to, plus my own Soundex-based
approaches (which on reflection did better than I thought :)

For each list, I tried to find matches using the following:

1) Plain vanilla VLOOKUP
2) Soundex based on whole string
3) "Soundex 2", based on concatenated Soundex values of first word and last word (uses Regular Expressions
to find the words)
4) Three different "fuzzy VLOOKUP" searches, using algorithms 1, 2, and 3 for the FuzzyVLookup function in
that mrexcel.com post

Each worksheet shows the results of the search; I shaded red the cells where the searches either failed to find
a match that was there, or returned a match that was false.

The code snippet contains my code; the workbook includes that code plus the code from Richard's link.

Regards,

Patrick
Option Explicit 
Function Soundex(InputStr As String) As String 
' Found at http://j-walk.com/ss/excel/tips/tip77.htm
' Based on function developed by Richard J. Yanco
' This function follows the Soundex rules given at
' http://home.utah-inter.net/kinsearch/Soundex.html 
    Dim Result As String, c As String * 1
    Dim Location As Long 
    InputStr = UCase(InputStr) 
'   First character must be a letter
    If Asc(Left(InputStr, 1)) < 65 Or Asc(Left(InputStr, 1)) > 90 Then
        Soundex = ""
        Exit Function
    Else 
'       Convert to Soundex: letters to their appropriate digit,
'                     A,E,I,O,U,Y ("slash letters") to slashes
'                     H,W, and everything else to zero-length string 
        Result = Left(InputStr, 1)
        For Location = 2 To Len(InputStr)
            Result = Result & Category(Mid(InputStr, Location, 1))
        Next Location
      
'       Remove double letters
        Location = 2
        Do While Location < Len(Result)
            If Mid(Result, Location, 1) = Mid(Result, Location + 1, 1) Then
                Result = Left(Result, Location) & Mid(Result, Location + 2)
            Else
                Location = Location + 1
            End If
        Loop
    
'       If category of 1st letter equals 2nd character, remove 2nd character
        If Category(Left(Result, 1)) = Mid(Result, 2, 1) Then
            Result = Left(Result, 1) & Mid(Result, 3)
        End If
    
'       Remove slashes
        For Location = 2 To Len(Result)
            If Mid(Result, Location, 1) = "/" Then
                Result = Left(Result, Location - 1) & Mid(Result, Location + 1)
            End If
        Next
    
'       Trim or pad with zeroes as necessary
        Soundex = Left(Result & "0000", 4)
    End If
End Function 
Private Function Category(c) As String
'   Returns a Soundex code for a letter
    Select Case True
        Case c Like "[AEIOUY]"
            Category = "/"
        Case c Like "[BPFV]"
            Category = "1"
        Case c Like "[CSKGJQXZ]"
            Category = "2"
        Case c Like "[DT]"
            Category = "3"
        Case c = "L"
            Category = "4"
        Case c Like "[MN]"
            Category = "5"
        Case c = "R"
            Category = "6"
        Case Else 'This includes vowels, Y, H and W, spaces, punctuation, etc.
            Category = ""
    End Select
End Function 
Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True)
    
    ' Function by Patrick Matthews
    
    ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
    ' pattern (PatternStr).  Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 0                   : the last match
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' If Pos is greater than the number of matches, is negative, or is non-numeric, the function
    ' returns an empty string.  If no match is found, the function returns an empty string
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    Dim RegX As Object
    Dim TheMatches As Object
    Dim Answer() As String
    Dim Counter As Long
    
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFind = ""
            Exit Function
        Else
            Pos = CLng(Pos)
        End If
    End If
    
    ' Create instance of RegExp object
    Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
    End With
        
    ' Test to see if there are any matches
    If RegX.test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        Set TheMatches = RegX.Execute(LookIn)
        
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1) As String
            For Counter = 0 To UBound(Answer)
                Answer(Counter) = TheMatches(Counter)
            Next
            RegExpFind = Answer
        
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
        Else
            Select Case Pos
                Case 0                          ' Last match
                    RegExpFind = TheMatches(TheMatches.Count - 1)
                Case 1 To TheMatches.Count      ' Nth match
                    RegExpFind = TheMatches(Pos - 1)
                Case Else                       ' Invalid item number
                    RegExpFind = ""
            End Select
        End If
    
    ' If there are no matches, return empty string
    Else
        RegExpFind = ""
    End If
    
    ' Release object variables
    Set RegX = Nothing
    Set TheMatches = Nothing
    
End Function

Open in new window

Q-24329564.xls
0
 
Patrick MatthewsCommented:
Richard,

No idea if I am using that FuzzyVLookup function optimally :)

Regards,

Patrick
0
 
Victor KimuraAuthor Commented:
Hi RichardSchollar/matthewspatrick:

Thanks for you help. I haven't fully tested it but I will shortly. I've been just swamped at work. Thanks for the code, matthewspatrick.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 3
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now