VBA to Cleanup Excel Field

I have the following text and need to extract the Supplier Name from the field. If multiple, they should be separated by a comma. Everything else can be removed.

<?xml version="1.0" encoding="utf-8"?><RepeaterData><Version /><Items><Item><SupplierName type="System.String">HP Inc.</SupplierName><Indicator type="System.String">Purple</ClientIndicator><Selection type="System.Boolean">False</Selection></Item><Item><SupplierName type="System.String">Ricoh</SupplierName><Indicator type="System.String">Purple</Indicator ><Selection type="System.Boolean">False</Selection></Item><Item><SupplierName type="System.String">Xerox</SupplierName><Indicator type="System.String">Purple</ClientIndicator><Selection type="System.Boolean">False</Selection></Item></Items></RepeaterData>

Open in new window


Any ideas?
kwarden13Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Patrick MatthewsCommented:
This is working for me:

1. Add this code to a regular VBA module:

Option Explicit

Function SupplierNames(BigString As String, Optional Delimiter As String = ", ") As String
    
    Dim arr As Variant
    Dim Counter As Long
    Dim MaxMatches As Long
    
    arr = RegExpFindSubmatch(BigString, "(<SupplierName[^>]+>)([^<]+)(</SupplierName>)", , 2)
    
    If IsArray(arr) Then
        MaxMatches = UBound(arr, 1)
        For Counter = 0 To MaxMatches
            SupplierNames = SupplierNames & Delimiter & arr(Counter, 0)
        Next
        SupplierNames = Mid(SupplierNames, 2)
    Else
        SupplierNames = ""
    End If
    
End Function

Function RegExpFindSubmatch(LookIn As String, PatternStr As String, Optional MatchPos, _
    Optional SubmatchPos, Optional MatchCase As Boolean = True, _
    Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' For more info, please see:
    ' http://www.experts-exchange.com/articles/Programming/Languages/Visual_Basic/Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string (LookIn), and return "submatches"
    ' from the various matches to a pattern (PatternStr).  In RegExp, submatches within a pattern
    ' are defined by grouping portions of the pattern within parentheses.
    
    ' Use MatchPos to indicate which match you want:
    ' MatchPos omitted               : function returns results for all matches
    ' MatchPos = 1                   : the first match
    ' MatchPos = 2                   : the second match
    ' MatchPos = <positive integer>  : the Nth match
    ' MatchPos = 0                   : the last match
    ' MatchPos = -1                  : the last match
    ' MatchPos = -2                  : the 2nd to last match
    ' MatchPos = <negative integer>  : the Nth to last match
    
    ' Use SubmatchPos to indicate which match you want:
    ' SubmatchPos omitted               : function returns results for all submatches
    ' SubmatchPos = 1                   : the first submatch
    ' SubmatchPos = 2                   : the second submatch
    ' SubmatchPos = <positive integer>  : the Nth submatch
    ' SubmatchPos = 0                   : the last submatch
    ' SubmatchPos = -1                  : the last submatch
    ' SubmatchPos = -2                  : the 2nd to last submatch
    ' SubmatchPos = <negative integer>  : the Nth to last submatch
    
    ' The return type for this function depends on whether your choice for MatchPos is looking for
    ' a single value or for potentially many.  All arrays returned by this function are zero-based.
    ' When the function returns a 2-D array, the first dimension is for the matches and the second
    ' dimension is for the submatches
    ' MatchPos omitted, SubmatchPos omitted: 2-D array of submatches for each match.  First dimension
    '                                        based on number of matches (0 to N-1), second dimension
    '                                        based on number of submatches (0 to N-1)
    ' MatchPos omitted, SubmatchPos used   : 2-D array (0 to N-1, 0 to 0) of the specified submatch
    '                                        from each match
    ' MatchPos used, SubmatchPos omitted   : 2-D array (0 to 0, 0 to N-1) of the submatches from the
    '                                        specified match
    ' MatchPos used, SubmatchPos used      : String with specified submatch from specified match
    
    ' For any submatch that is not found, the function treats the result as a zero-length string
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    Dim TheMatches As Object
    Dim Mat As Object
    Dim Answer() As String
    Dim Counter As Long
    Dim SubCounter As Long
    
    ' Evaluate MatchPos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(MatchPos) Then
        If Not IsNumeric(MatchPos) Then
            RegExpFindSubmatch = ""
            Exit Function
        Else
            MatchPos = CLng(MatchPos)
        End If
    End If
    
    ' Evaluate SubmatchPos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(SubmatchPos) Then
        If Not IsNumeric(SubmatchPos) Then
            RegExpFindSubmatch = ""
            Exit Function
        Else
            SubmatchPos = CLng(SubmatchPos)
        End If
    End If
    
    ' Create instance of RegExp object
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
        
    ' Test to see if there are any matches
    
    If RegX.Test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        
        Set TheMatches = RegX.Execute(LookIn)
        
        ' If MatchPos is missing, user either wants array of all the submatches for each match, or an
        ' array of all the specified submatches for each match.  Build it and assign it as the
        ' function's return value
        
        If IsMissing(MatchPos) Then
            
            ' Return value is a 2-D array of all the submatches for each match
            
            If IsMissing(SubmatchPos) Then
                For Counter = 0 To TheMatches.Count - 1
                    Set Mat = TheMatches(Counter)
                    
                    ' To determine how many submatches there are we need to first evaluate a match.  That
                    ' is why we redim the array inside the for/next loop
                    
                    If Counter = 0 Then
                        ReDim Answer(0 To TheMatches.Count - 1, 0 To Mat.Submatches.Count - 1) As String
                    End If
                    
                    ' Loop through the submatches and populate the array.  If the Nth submatch is not
                    ' found, RegExp returns a zero-length string
                    
                    For SubCounter = 0 To UBound(Answer, 2)
                        Answer(Counter, SubCounter) = Mat.Submatches(SubCounter)
                    Next
                Next
            
            ' Return value is a 2-D array of the specified submatch for each match.
            
            Else
                For Counter = 0 To TheMatches.Count - 1
                    Set Mat = TheMatches(Counter)
                    
                    ' To determine how many submatches there are we need to first evaluate a match.  That
                    ' is why we redim the array inside the for/next loop.  If SubmatchPos = 0, then we want
                    ' the last submatch.  In that case reset SubmatchPos so it equals the submatch count.
                    ' Negative number indicates Nth to last; convert that to applicable "positive" position
                    
                    If Counter = 0 Then
                        ReDim Answer(0 To TheMatches.Count - 1, 0 To 0) As String
                        Select Case SubmatchPos
                            Case Is > 0: 'no adjustment needed
                            Case 0, -1: SubmatchPos = Mat.Submatches.Count
                            Case Is < -Mat.Submatches.Count: SubmatchPos = -SubmatchPos
                            Case Else: SubmatchPos = Mat.Submatches.Count + SubmatchPos + 1
                        End Select
                    End If
                    
                    ' Populate array with the submatch value.  If the submatch value is not found, or if
                    ' SubmatchPos > the count of submatches, populate with a zero-length string
                    
                    If SubmatchPos <= Mat.Submatches.Count Then
                        Answer(Counter, 0) = Mat.Submatches(SubmatchPos - 1)
                    Else
                        Answer(Counter, 0) = ""
                    End If
                Next
            End If
            RegExpFindSubmatch = Answer
            
        ' User wanted the info associated with the Nth match (or last match, if MatchPos = 0)
        
        Else
            
            ' If MatchPos = 0 then make MatchPos equal the match count.  If negative (indicates Nth
            ' to last), convert to equivalent position.
            
            Select Case MatchPos
                Case Is > 0: 'no adjustment needed
                Case 0, -1: MatchPos = TheMatches.Count
                Case Is < -TheMatches.Count: MatchPos = -MatchPos
                Case Else: MatchPos = TheMatches.Count + MatchPos + 1
            End Select
            
            ' As long as MatchPos does not exceed the match count, process the Nth match.  If the
            ' match count is exceeded, return a zero-length string
            
            If MatchPos <= TheMatches.Count Then
                Set Mat = TheMatches(MatchPos - 1)
                
                ' User wants a 2-D array of all submatches for the specified match; populate array.  If
                ' a particular submatch is not found, RegExp treats it as a zero-length string
                
                If IsMissing(SubmatchPos) Then
                    ReDim Answer(0 To 0, 0 To Mat.Submatches.Count - 1)
                    For SubCounter = 0 To UBound(Answer, 2)
                        Answer(0, SubCounter) = Mat.Submatches(SubCounter)
                    Next
                    RegExpFindSubmatch = Answer
                
                ' User wants a single value
                
                Else
                    
                    ' If SubmatchPos = 0 then make it equal count of submatches.  If negative, this
                    ' indicates Nth to last; convert to equivalent positive position
                    
                    Select Case SubmatchPos
                        Case Is > 0: 'no adjustment needed
                        Case 0, -1: SubmatchPos = Mat.Submatches.Count
                        Case Is < -Mat.Submatches.Count: SubmatchPos = -SubmatchPos
                        Case Else: SubmatchPos = Mat.Submatches.Count + SubmatchPos + 1
                    End Select
                    
                    ' If SubmatchPos <= count of submatches, then get that submatch for the specified
                    ' match.  If the submatch value is not found, or if SubmathPos exceeds count of
                    ' submatches, return a zero-length string.  In testing, it appeared necessary to
                    ' use CStr to coerce the return to be a zero-length string instead of zero
                    
                    If SubmatchPos <= Mat.Submatches.Count Then
                        RegExpFindSubmatch = CStr(Mat.Submatches(SubmatchPos - 1))
                    Else
                        RegExpFindSubmatch = ""
                    End If
                End If
            Else
                RegExpFindSubmatch = ""
            End If
        End If
    
    ' If there are no matches, return empty string
    
    Else
        RegExpFindSubmatch = ""
    End If
    
Cleanup:
    ' Release object variables
    Set Mat = Nothing
    Set TheMatches = Nothing
    
End Function

Open in new window


2. Use a formula like this to extract the supplier names:

=SupplierNames(A1)

For more info on the Regular Expressions approach I took above, please see my article here.



BTW, that function includes an optional second argument to specify the delimiter.  If you wanted to switch the delimiter to, say semicolon+space, just rewrite the formula as:

=SupplierNames(A1,"; ")
1

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Patrick MatthewsCommented:
Appears to answer the question
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VBA

From novice to tech pro — start learning today.