Link to home
Start Free TrialLog in
Avatar of doolinn
doolinnFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Excel: Split Text/Number String

I need to split a cell into three different cells - at the moment it is in the format AAAA111AA and I need to break it out into three cells showing (1) the string before the numeric piece starts - ie AAAA (2) the numeric string - ie 111 and (3) the piece after the numeric string - ie AA.

Note each of the three parts of the split string can be any length and "part (3)" will only have a value if "part (2)" exists.

ie I could have

"AAAAAA"  is split to (1) AAAAAA (2) [blank] and (3) [blank]
"AA2314434" is split to (1) AA (2) 2314434 and (3) [blank]
"AABZEEH346678XXA" is split (1) AABZEEH (2) 346678 and (3) XXA

What is the formula for each of the cells (1),(2) and (3)?

Thanks

Avatar of Dave
Dave
Flag of Australia image

Is a formula a must or are you ok with automating it with code?

Cheers

Dave
Avatar of doolinn

ASKER

I do not have a preference between a formula vs code.
This code runs over the active selection (single column) and splits the valid cells in-situ (see example)

Non matching cells are skipped
Cheers
Dave

Sub Slice()
    Dim rng1 As Range
    Dim RegEx, RegMC
    Dim lngRow As Long
    Dim X

    Set rng1 = Selection
    X = rng1.Resize(rng1.Rows.Count, 3)
    Set RegEx = CreateObject("vbscript.regexp")
    RegEx.Pattern = "([A-Z]+)(\d+)?([A-Z]+)?"

    For lngRow = 1 To UBound(X, 1)

        If RegEx.test(X(lngRow, 1)) Then
            Set RegMC = RegEx.Execute(X(lngRow, 1))
            X(lngRow, 1) = RegMC(0).submatches(0)
            X(lngRow, 2) = RegMC(0).submatches(1)
            X(lngRow, 3) = RegMC(0).submatches(2)
        End If
    Next lngRow
    rng1.Resize(rng1.Rows.Count, 3) = X
End Sub

Open in new window

split.xls
Avatar of doolinn

ASKER

This looks great - one small point but I still need to keep the original string in the original cell - so in your example column B keeps the original data and column C,D and E show the split. Can you advise on the changes to the code needed to effect this. Thanks
SOLUTION
Avatar of barry houdini
barry houdini
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
doolin,

If the data are always as depicted above, then barry and Dave have you well-covered. However, is it possible that you will ever get entries such as:

AABZEEH346678XXA42 (note the extra digit portion at the end)
1234 (note the absence of letters)

If entries such as these are possible, then you may want to have a look at my approach as demonstrated on the "matthewspatrick" worksheet in the attached workbook. It produces the results as shown in the attached JPG.

Like Dave, I used a Regular Expressions-based approach, relying on the RegExpFindSubmatch function described in my article:
http://www.experts-exchange.com/articles/Programming/Languages/Visual_Basic/Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html

The formulae are:

B2: =RegExpFindSubmatch(A2,"(^[a-z]+)",1,1,FALSE)
C2: =RegExpFindSubmatch(A2,"(^[a-z]+)(\d+)",1,2,FALSE)
D2: =RegExpFindSubmatch(A2,"(^[a-z]+)(\d+)([a-z]+)",1,3,FALSE)


Patrick


Function RegExpFindSubmatch(LookIn As String, PatternStr As String, Optional MatchPos, _
    Optional SubmatchPos, Optional MatchCase As Boolean = True, _
    Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' For more info, please see:
    ' http://www.experts-exchange.com/articles/Programming/Languages/Visual_Basic/Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string (LookIn), and return "submatches"
    ' from the various matches to a pattern (PatternStr).  In RegExp, submatches within a pattern
    ' are defined by grouping portions of the pattern within parentheses.
    
    ' Use MatchPos to indicate which match you want:
    ' MatchPos omitted               : function returns results for all matches
    ' MatchPos = 1                   : the first match
    ' MatchPos = 2                   : the second match
    ' MatchPos = <positive integer>  : the Nth match
    ' MatchPos = 0                   : the last match
    ' MatchPos = -1                  : the last match
    ' MatchPos = -2                  : the 2nd to last match
    ' MatchPos = <negative integer>  : the Nth to last match
    
    ' Use SubmatchPos to indicate which match you want:
    ' SubmatchPos omitted               : function returns results for all submatches
    ' SubmatchPos = 1                   : the first submatch
    ' SubmatchPos = 2                   : the second submatch
    ' SubmatchPos = <positive integer>  : the Nth submatch
    ' SubmatchPos = 0                   : the last submatch
    ' SubmatchPos = -1                  : the last submatch
    ' SubmatchPos = -2                  : the 2nd to last submatch
    ' SubmatchPos = <negative integer>  : the Nth to last submatch
    
    ' The return type for this function depends on whether your choice for MatchPos is looking for
    ' a single value or for potentially many.  All arrays returned by this function are zero-based.
    ' When the function returns a 2-D array, the first dimension is for the matches and the second
    ' dimension is for the submatches
    ' MatchPos omitted, SubmatchPos omitted: 2-D array of submatches for each match.  First dimension
    '                                        based on number of matches (0 to N-1), second dimension
    '                                        based on number of submatches (0 to N-1)
    ' MatchPos omitted, SubmatchPos used   : 2-D array (0 to N-1, 0 to 0) of the specified submatch
    '                                        from each match
    ' MatchPos used, SubmatchPos omitted   : 2-D array (0 to 0, 0 to N-1) of the submatches from the
    '                                        specified match
    ' MatchPos used, SubmatchPos used      : String with specified submatch from specified match
    
    ' For any submatch that is not found, the function treats the result as a zero-length string
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    Dim TheMatches As Object
    Dim Mat As Object
    Dim Answer() As String
    Dim Counter As Long
    Dim SubCounter As Long
    
    ' Evaluate MatchPos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(MatchPos) Then
        If Not IsNumeric(MatchPos) Then
            RegExpFindSubmatch = ""
            Exit Function
        Else
            MatchPos = CLng(MatchPos)
        End If
    End If
    
    ' Evaluate SubmatchPos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(SubmatchPos) Then
        If Not IsNumeric(SubmatchPos) Then
            RegExpFindSubmatch = ""
            Exit Function
        Else
            SubmatchPos = CLng(SubmatchPos)
        End If
    End If
    
    ' Create instance of RegExp object
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
        
    ' Test to see if there are any matches
    
    If RegX.test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        
        Set TheMatches = RegX.Execute(LookIn)
        
        ' If MatchPos is missing, user either wants array of all the submatches for each match, or an
        ' array of all the specified submatches for each match.  Build it and assign it as the
        ' function's return value
        
        If IsMissing(MatchPos) Then
            
            ' Return value is a 2-D array of all the submatches for each match
            
            If IsMissing(SubmatchPos) Then
                For Counter = 0 To TheMatches.Count - 1
                    Set Mat = TheMatches(Counter)
                    
                    ' To determine how many submatches there are we need to first evaluate a match.  That
                    ' is why we redim the array inside the for/next loop
                    
                    If Counter = 0 Then
                        ReDim Answer(0 To TheMatches.Count - 1, 0 To Mat.submatches.Count - 1) As String
                    End If
                    
                    ' Loop through the submatches and populate the array.  If the Nth submatch is not
                    ' found, RegExp returns a zero-length string
                    
                    For SubCounter = 0 To UBound(Answer, 2)
                        Answer(Counter, SubCounter) = Mat.submatches(SubCounter)
                    Next
                Next
            
            ' Return value is a 2-D array of the specified submatch for each match.
            
            Else
                For Counter = 0 To TheMatches.Count - 1
                    Set Mat = TheMatches(Counter)
                    
                    ' To determine how many submatches there are we need to first evaluate a match.  That
                    ' is why we redim the array inside the for/next loop.  If SubmatchPos = 0, then we want
                    ' the last submatch.  In that case reset SubmatchPos so it equals the submatch count.
                    ' Negative number indicates Nth to last; convert that to applicable "positive" position
                    
                    If Counter = 0 Then
                        ReDim Answer(0 To TheMatches.Count - 1, 0 To 0) As String
                        Select Case SubmatchPos
                            Case Is > 0: 'no adjustment needed
                            Case 0, -1: SubmatchPos = Mat.submatches.Count
                            Case Is < -Mat.submatches.Count: SubmatchPos = -SubmatchPos
                            Case Else: SubmatchPos = Mat.submatches.Count + SubmatchPos + 1
                        End Select
                    End If
                    
                    ' Populate array with the submatch value.  If the submatch value is not found, or if
                    ' SubmatchPos > the count of submatches, populate with a zero-length string
                    
                    If SubmatchPos <= Mat.submatches.Count Then
                        Answer(Counter, 0) = Mat.submatches(SubmatchPos - 1)
                    Else
                        Answer(Counter, 0) = ""
                    End If
                Next
            End If
            RegExpFindSubmatch = Answer
            
        ' User wanted the info associated with the Nth match (or last match, if MatchPos = 0)
        
        Else
            
            ' If MatchPos = 0 then make MatchPos equal the match count.  If negative (indicates Nth
            ' to last), convert to equivalent position.
            
            Select Case MatchPos
                Case Is > 0: 'no adjustment needed
                Case 0, -1: MatchPos = TheMatches.Count
                Case Is < -TheMatches.Count: MatchPos = -MatchPos
                Case Else: MatchPos = TheMatches.Count + MatchPos + 1
            End Select
            
            ' As long as MatchPos does not exceed the match count, process the Nth match.  If the
            ' match count is exceeded, return a zero-length string
            
            If MatchPos <= TheMatches.Count Then
                Set Mat = TheMatches(MatchPos - 1)
                
                ' User wants a 2-D array of all submatches for the specified match; populate array.  If
                ' a particular submatch is not found, RegExp treats it as a zero-length string
                
                If IsMissing(SubmatchPos) Then
                    ReDim Answer(0 To 0, 0 To Mat.submatches.Count - 1)
                    For SubCounter = 0 To UBound(Answer, 2)
                        Answer(0, SubCounter) = Mat.submatches(SubCounter)
                    Next
                    RegExpFindSubmatch = Answer
                
                ' User wants a single value
                
                Else
                    
                    ' If SubmatchPos = 0 then make it equal count of submatches.  If negative, this
                    ' indicates Nth to last; convert to equivalent positive position
                    
                    Select Case SubmatchPos
                        Case Is > 0: 'no adjustment needed
                        Case 0, -1: SubmatchPos = Mat.submatches.Count
                        Case Is < -Mat.submatches.Count: SubmatchPos = -SubmatchPos
                        Case Else: SubmatchPos = Mat.submatches.Count + SubmatchPos + 1
                    End Select
                    
                    ' If SubmatchPos <= count of submatches, then get that submatch for the specified
                    ' match.  If the submatch value is not found, or if SubmathPos exceeds count of
                    ' submatches, return a zero-length string.  In testing, it appeared necessary to
                    ' use CStr to coerce the return to be a zero-length string instead of zero
                    
                    If SubmatchPos <= Mat.submatches.Count Then
                        RegExpFindSubmatch = CStr(Mat.submatches(SubmatchPos - 1))
                    Else
                        RegExpFindSubmatch = ""
                    End If
                End If
            Else
                RegExpFindSubmatch = ""
            End If
        End If
    
    ' If there are no matches, return empty string
    
    Else
        RegExpFindSubmatch = ""
    End If
    
Cleanup:
    ' Release object variables
    Set Mat = Nothing
    Set TheMatches = Nothing
    
End Function

Open in new window

Q-25602215.xls
Q-25602215.JPG
BTW, if "1234" is possible, and you would want that captured into Field 2, then you can amend my formula for C2 to be:

 =RegExpFindSubmatch(A2,"(\d+)",1,1,FALSE)

Patrick