Link to home
Start Free TrialLog in
Avatar of karinos57
karinos57Flag for Afghanistan

asked on

How to split string

Hi,
I've looked at several threads on the forum relating to splitting a string, however there not quite answering the problem I have.  I have something like this:
SEVOFLURANE   INH  6X250 ML                                  
What i want to do is to copy 250 ML in one column and the 6 in onother column    
For example:
                     Column A                          Column B        Column C
SEVOFLURANE   INH  6X250 ML             250 ML             6                                    
I have provided more details in the attached file pls. see attached for more details.  thanks
Book1.xlsx
Avatar of Paul MacDonald
Paul MacDonald
Flag of United States of America image

Is "X" always the delimiter?  Is there any chance the Column B data will have an "X" in it as well?
Thats also my question.
What about this?
strText = "SEVOFLURANE   INH  6X250 ML"

strTemp = Right(strText,8)

strColumnB = Split(strTemp,"X")(1)
strColumnC = Split(strTemp,"X")(0)

wscript.echo strColumnB
wscript.echo strColumnC

Open in new window

Are you looking for VBA code to split the string?
SOLUTION
Avatar of jimmym715
jimmym715

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
a mix of find and mid
need time to think here..
Here is a user defined function that will accept ml, mg, g and dl as units, with the option of spaces. Install it in a regular module sheet, then use it with an array formula by selection both target cells, then array entering:
=PharmaParser(A2)

To array enter a formula, hold the Control and Shift keys down, then hit Enter. Excel will respond by adding curly braces { } surrounding the formula.

Brad
Function PharmaParser(s As String)
Dim RegEx As Object, oMatches As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.IgnoreCase = True
RegEx.Pattern = "(\d+)(\s*)(x)(\s*)(\d+)(\s*)(ml|mg|g|dl)"
Set oMatches = RegEx.Execute(s)
If oMatches.Count > 0 Then
    PharmaParser = Array(oMatches(0).submatches(4) & " " & oMatches(0).submatches(6), oMatches(0).submatches(0))
Else
    PharmaParser = Array("", "")
End If
Set oMatches = Nothing
Set RegEx = Nothing
End Function

Open in new window

PharmaParserQ25770462.xls
This version captures the decimal point
Function PharmaParser(s As String)
Dim RegEx As Object, oMatches As Object
Set RegEx = CreateObject("VBScript.RegExp")
RegEx.IgnoreCase = True
RegEx.Pattern = "(\d+)(\s*)(x)(\s*)(\d*\.?\d*)(\s*)(ml|mg|g|dl)"
Set oMatches = RegEx.Execute(s)
If oMatches.Count > 0 Then
    PharmaParser = Array(oMatches(0).submatches(4) & " " & oMatches(0).submatches(6), oMatches(0).submatches(0))
Else
    PharmaParser = Array("", "")
End If
Set oMatches = Nothing
Set RegEx = Nothing
End Function

Open in new window

I've got a formulaic answer shortly
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
here's the file I used to develop the above formulae

Put together like the abve the formulae are difficult to read - the attached file shows how they were developed - results being pasted in to the final single cell results.

Anthony
StockQuantities01.xls
if you wanted the package Q to be a value then I'd need to add a VALUE in to the formula.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Just for giggles, here is another, RegExp-based approach.  For more info on RegExp, you can see my article: https://www.experts-exchange.com/articles/Programming/Languages/Visual_Basic/Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html

1) Add the code below to a regular VBA module

2) To get the package size, use this formula:

=RegExpFind(TRIM(B3),"\d+(\.\d+)? (ml|cl|dl|l|mg|cg|g|kg)$",1,FALSE)

and to get quantity:

=RegExpFind(TRIM(B3),"\d+(?=X\d+(\.\d+)? (ml|cl|dl|l|mg|cg|g|kg))",1,FALSE)

Please see the attached file for an example.

Note to Brad: I don't think your pattern is picking up decimals.  See the entries on Row 4 :)

Note that if you want the quantity to be numeric, and if you are in Excel 2007 or 2010, you can use this formula instead:

=IFERROR(RegExpFind(TRIM(B3),"\d+(?=X\d+(\.\d+)? (ml|cl|dl|l|mg|cg|g|kg))",1,FALSE),"")

Patrick



Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True, Optional ReturnType As Long = 0, _
    Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' For more info, please see:
    ' http://www.experts-exchange.com/articles/Programming/Languages/Visual_Basic/Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
    ' pattern (PatternStr).  Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' Pos = 0                   : the last match
    ' Pos = -1                  : the last match
    ' Pos = -2                  : the 2nd to last match
    ' Pos = <negative integer>  : the Nth to last match
    ' If Pos is non-numeric, or if the absolute value of Pos is greater than the number of
    ' matches, the function returns an empty string.  If no match is found, the function returns
    ' an empty string.  (Earlier versions of this code used zero for the last match; this is
    ' retained for backward compatibility)
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' ReturnType indicates what information you want to return:
    ' ReturnType = 0            : the matched values
    ' ReturnType = 1            : the starting character positions for the matched values
    ' ReturnType = 2            : the lengths of the matched values
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    ' Note: RegExp counts the character positions for the Match.FirstIndex property as starting
    ' at zero.  Since VB6 and VBA has strings starting at position 1, I have added one to make
    ' the character positions conform to VBA/VB6 expectations
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    Dim TheMatches As Object
    Dim Answer()
    Dim Counter As Long
    
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFind = ""
            Exit Function
        Else
            Pos = CLng(Pos)
        End If
    End If
    
    ' Evaluate ReturnType
    
    If ReturnType < 0 Or ReturnType > 2 Then
        RegExpFind = ""
        Exit Function
    End If
    
    ' Create instance of RegExp object if needed, and set properties
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
        
    ' Test to see if there are any matches
    
    If RegX.Test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        
        Set TheMatches = RegX.Execute(LookIn)
        
        ' Test to see if Pos is negative, which indicates the user wants the Nth to last
        ' match.  If it is, then based on the number of matches convert Pos to a positive
        ' number, or zero for the last match
        
        If Not IsMissing(Pos) Then
            If Pos < 0 Then
                If Pos = -1 Then
                    Pos = 0
                Else
                    
                    ' If Abs(Pos) > number of matches, then the Nth to last match does not
                    ' exist.  Return a zero-length string
                    
                    If Abs(Pos) <= TheMatches.Count Then
                        Pos = TheMatches.Count + Pos + 1
                    Else
                        RegExpFind = ""
                        GoTo Cleanup
                    End If
                End If
            End If
        End If
        
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1)
            For Counter = 0 To UBound(Answer)
                Select Case ReturnType
                    Case 0: Answer(Counter) = TheMatches(Counter)
                    Case 1: Answer(Counter) = TheMatches(Counter).FirstIndex + 1
                    Case 2: Answer(Counter) = TheMatches(Counter).Length
                End Select
            Next
            RegExpFind = Answer
        
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
        
        Else
            Select Case Pos
                Case 0                          ' Last match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(TheMatches.Count - 1)
                        Case 1: RegExpFind = TheMatches(TheMatches.Count - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(TheMatches.Count - 1).Length
                    End Select
                Case 1 To TheMatches.Count      ' Nth match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(Pos - 1)
                        Case 1: RegExpFind = TheMatches(Pos - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(Pos - 1).Length
                    End Select
                Case Else                       ' Invalid item number
                    RegExpFind = ""
            End Select
        End If
    
    ' If there are no matches, return empty string
    
    Else
        RegExpFind = ""
    End If
    
Cleanup:
    ' Release object variables
    
    Set TheMatches = Nothing
    
End Function

Open in new window

Q-25770462.xls
I wish that regex worked without vba..    (and what does "dim" mean?)

anyway, on reviewing my formulaic offering above, I can confirm that at least for me, I cannot understand why it works without reading the working file used to create it. This is because there are occasional multiple instances of triple spaces, double spaces and single spaces as well as double instances of "ML" and my formulae use only FIND and MID, which would suggest that with multiple instances of say ML, they should not work, and yet they do... until peer review.. (!)

Anthony
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
matthewspatrick,
The second version of my code catered to the decimal point possibility.

Brad


Dim may be thought of as short for "Dimension" if that helps you to remember its function. That said, VBA doesn't support using Dimension in its place.
Brad,

>>The second version of my code catered to the decimal point possibility.

I really ought to have known better than to think you hadn't already thought of it.

Sorry for not noticing that follow-up :)

Patrick
OK, I'm late to this particular party but anyway.........to accommdate the possibility of an "X" somewhere in the description before the 6X50 part I'd suggest these formulas in C3 and D3
=IF(RIGHT(TRIM(B3),2)="ML",TRIM(RIGHT(SUBSTITUTE(TRIM(B3),"X",REPT(" ",10)),10)),"")
and
=IF(C3="","",TRIM(RIGHT(SUBSTITUTE(SUBSTITUTE(TRIM(B3),"X"&C3,"")," ",REPT(" ",10)),10))+0)
see attached where I altered the text in B10 to demonstrate
regards, barry

25770462.xlsx
Avatar of karinos57

ASKER

Wow! i am overwhelmed with your answers now and i need some time to go through all these and see which one is the best solution for me.  Most of the time X is the delimiter but there are times where there is no X and in this case i don't know what can  be done to over come that problem but for now X is the magic work.  Let me look at all these solutions and i hope i will find the answer i need.  thank you all for your support.
>>there are times where there is no X and in this case i don't know what can  be done to over come that
>>problem but for now X is the magic work

Can you provide some examples with a delimiter besides X?
If you are using one of the RegExp suggestions, alternate delimiters can be handled by separating them with a pipe symbol in the Pattern property. For example:
RegEx.Pattern = "(\d+)(\s*)(x|y|z)(\s*)(\d*\.?\d*)(\s*)(ml|mg|g|dl)"          'Allows x, y or z to be the delimiter

If you need more than one character, that's OK. The units of measure in that statement are either mL, mg, g or dL as you can see from the very last parenthetical expression.

Brad
BTW, I did a little more digging, and I found some points where the RegExp could possibly be improved.  This file uses a different function from my article https://www.experts-exchange.com/articles/Programming/Languages/Visual_Basic/Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html, RegExpFindSubmatch.


Using that to find the Package Size:

=RegExpFindSubmatch($B3,"(\d+)(X)(\d+(\.\d+)? (ml|cl|dl|l|mg|cg|g|kg))( *)$",1,3,FALSE)

and the Quantity:

=RegExpFindSubmatch($B3,"(\d+)(X)(\d+(\.\d+)? (ml|cl|dl|l|mg|cg|g|kg))( *)$",1,3,FALSE)

or to get a value, using Excel 2007 or later,

=IFERROR(VALUE(RegExpFindSubmatch($B3,"(\d+)(X)(\d+(\.\d+)? (ml|cl|dl|l|mg|cg|g|kg))( *)$",1,1,FALSE)),"")


In the attached file, I altered the values in Rows 2 and 6 to highlight how this new approach can get different results from Brad's revised approach and my original approach.


If you are a newcomer to RegExp, they can be hard to wrap your mind around, so if you want to know how that works, just let us know :)


Patrick


Function RegExpFindSubmatch(LookIn As String, PatternStr As String, Optional MatchPos, _
    Optional SubmatchPos, Optional MatchCase As Boolean = True, _
    Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' For more info, please see:
    ' http://www.experts-exchange.com/articles/Programming/Languages/Visual_Basic/Using-Regular-Expressions-in-Visual-Basic-for-Applications-and-Visual-Basic-6.html
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string (LookIn), and return "submatches"
    ' from the various matches to a pattern (PatternStr).  In RegExp, submatches within a pattern
    ' are defined by grouping portions of the pattern within parentheses.
    
    ' Use MatchPos to indicate which match you want:
    ' MatchPos omitted               : function returns results for all matches
    ' MatchPos = 1                   : the first match
    ' MatchPos = 2                   : the second match
    ' MatchPos = <positive integer>  : the Nth match
    ' MatchPos = 0                   : the last match
    ' MatchPos = -1                  : the last match
    ' MatchPos = -2                  : the 2nd to last match
    ' MatchPos = <negative integer>  : the Nth to last match
    
    ' Use SubmatchPos to indicate which match you want:
    ' SubmatchPos omitted               : function returns results for all submatches
    ' SubmatchPos = 1                   : the first submatch
    ' SubmatchPos = 2                   : the second submatch
    ' SubmatchPos = <positive integer>  : the Nth submatch
    ' SubmatchPos = 0                   : the last submatch
    ' SubmatchPos = -1                  : the last submatch
    ' SubmatchPos = -2                  : the 2nd to last submatch
    ' SubmatchPos = <negative integer>  : the Nth to last submatch
    
    ' The return type for this function depends on whether your choice for MatchPos is looking for
    ' a single value or for potentially many.  All arrays returned by this function are zero-based.
    ' When the function returns a 2-D array, the first dimension is for the matches and the second
    ' dimension is for the submatches
    ' MatchPos omitted, SubmatchPos omitted: 2-D array of submatches for each match.  First dimension
    '                                        based on number of matches (0 to N-1), second dimension
    '                                        based on number of submatches (0 to N-1)
    ' MatchPos omitted, SubmatchPos used   : 2-D array (0 to N-1, 0 to 0) of the specified submatch
    '                                        from each match
    ' MatchPos used, SubmatchPos omitted   : 2-D array (0 to 0, 0 to N-1) of the submatches from the
    '                                        specified match
    ' MatchPos used, SubmatchPos used      : String with specified submatch from specified match
    
    ' For any submatch that is not found, the function treats the result as a zero-length string
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    Dim TheMatches As Object
    Dim Mat As Object
    Dim Answer() As String
    Dim Counter As Long
    Dim SubCounter As Long
    
    ' Evaluate MatchPos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(MatchPos) Then
        If Not IsNumeric(MatchPos) Then
            RegExpFindSubmatch = ""
            Exit Function
        Else
            MatchPos = CLng(MatchPos)
        End If
    End If
    
    ' Evaluate SubmatchPos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(SubmatchPos) Then
        If Not IsNumeric(SubmatchPos) Then
            RegExpFindSubmatch = ""
            Exit Function
        Else
            SubmatchPos = CLng(SubmatchPos)
        End If
    End If
    
    ' Create instance of RegExp object
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
        
    ' Test to see if there are any matches
    
    If RegX.Test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        
        Set TheMatches = RegX.Execute(LookIn)
        
        ' If MatchPos is missing, user either wants array of all the submatches for each match, or an
        ' array of all the specified submatches for each match.  Build it and assign it as the
        ' function's return value
        
        If IsMissing(MatchPos) Then
            
            ' Return value is a 2-D array of all the submatches for each match
            
            If IsMissing(SubmatchPos) Then
                For Counter = 0 To TheMatches.Count - 1
                    Set Mat = TheMatches(Counter)
                    
                    ' To determine how many submatches there are we need to first evaluate a match.  That
                    ' is why we redim the array inside the for/next loop
                    
                    If Counter = 0 Then
                        ReDim Answer(0 To TheMatches.Count - 1, 0 To Mat.submatches.Count - 1) As String
                    End If
                    
                    ' Loop through the submatches and populate the array.  If the Nth submatch is not
                    ' found, RegExp returns a zero-length string
                    
                    For SubCounter = 0 To UBound(Answer, 2)
                        Answer(Counter, SubCounter) = Mat.submatches(SubCounter)
                    Next
                Next
            
            ' Return value is a 2-D array of the specified submatch for each match.
            
            Else
                For Counter = 0 To TheMatches.Count - 1
                    Set Mat = TheMatches(Counter)
                    
                    ' To determine how many submatches there are we need to first evaluate a match.  That
                    ' is why we redim the array inside the for/next loop.  If SubmatchPos = 0, then we want
                    ' the last submatch.  In that case reset SubmatchPos so it equals the submatch count.
                    ' Negative number indicates Nth to last; convert that to applicable "positive" position
                    
                    If Counter = 0 Then
                        ReDim Answer(0 To TheMatches.Count - 1, 0 To 0) As String
                        Select Case SubmatchPos
                            Case Is > 0: 'no adjustment needed
                            Case 0, -1: SubmatchPos = Mat.submatches.Count
                            Case Is < -Mat.submatches.Count: SubmatchPos = -SubmatchPos
                            Case Else: SubmatchPos = Mat.submatches.Count + SubmatchPos + 1
                        End Select
                    End If
                    
                    ' Populate array with the submatch value.  If the submatch value is not found, or if
                    ' SubmatchPos > the count of submatches, populate with a zero-length string
                    
                    If SubmatchPos <= Mat.submatches.Count Then
                        Answer(Counter, 0) = Mat.submatches(SubmatchPos - 1)
                    Else
                        Answer(Counter, 0) = ""
                    End If
                Next
            End If
            RegExpFindSubmatch = Answer
            
        ' User wanted the info associated with the Nth match (or last match, if MatchPos = 0)
        
        Else
            
            ' If MatchPos = 0 then make MatchPos equal the match count.  If negative (indicates Nth
            ' to last), convert to equivalent position.
            
            Select Case MatchPos
                Case Is > 0: 'no adjustment needed
                Case 0, -1: MatchPos = TheMatches.Count
                Case Is < -TheMatches.Count: MatchPos = -MatchPos
                Case Else: MatchPos = TheMatches.Count + MatchPos + 1
            End Select
            
            ' As long as MatchPos does not exceed the match count, process the Nth match.  If the
            ' match count is exceeded, return a zero-length string
            
            If MatchPos <= TheMatches.Count Then
                Set Mat = TheMatches(MatchPos - 1)
                
                ' User wants a 2-D array of all submatches for the specified match; populate array.  If
                ' a particular submatch is not found, RegExp treats it as a zero-length string
                
                If IsMissing(SubmatchPos) Then
                    ReDim Answer(0 To 0, 0 To Mat.submatches.Count - 1)
                    For SubCounter = 0 To UBound(Answer, 2)
                        Answer(0, SubCounter) = Mat.submatches(SubCounter)
                    Next
                    RegExpFindSubmatch = Answer
                
                ' User wants a single value
                
                Else
                    
                    ' If SubmatchPos = 0 then make it equal count of submatches.  If negative, this
                    ' indicates Nth to last; convert to equivalent positive position
                    
                    Select Case SubmatchPos
                        Case Is > 0: 'no adjustment needed
                        Case 0, -1: SubmatchPos = Mat.submatches.Count
                        Case Is < -Mat.submatches.Count: SubmatchPos = -SubmatchPos
                        Case Else: SubmatchPos = Mat.submatches.Count + SubmatchPos + 1
                    End Select
                    
                    ' If SubmatchPos <= count of submatches, then get that submatch for the specified
                    ' match.  If the submatch value is not found, or if SubmathPos exceeds count of
                    ' submatches, return a zero-length string.  In testing, it appeared necessary to
                    ' use CStr to coerce the return to be a zero-length string instead of zero
                    
                    If SubmatchPos <= Mat.submatches.Count Then
                        RegExpFindSubmatch = CStr(Mat.submatches(SubmatchPos - 1))
                    Else
                        RegExpFindSubmatch = ""
                    End If
                End If
            Else
                RegExpFindSubmatch = ""
            End If
        End If
    
    ' If there are no matches, return empty string
    
    Else
        RegExpFindSubmatch = ""
    End If
    
Cleanup:
    ' Release object variables
    Set Mat = Nothing
    Set TheMatches = Nothing
    
End Function

Open in new window

Q-25770462-v2.xls
, if X is not a constant, perhaps some examples please? formulae can me altered to cater for more than one x if that's a risk.

Have to ask if the system that produces the original text strings cannot be asked to provide some consistency, as it looks as if it is reporting on separate fields in the first place? Guessing of course.

Anthony
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
And breaking down the pattern I landed on...

(\d+)(X)(\d+(\.\d+)? (ml|cl|dl|l|mg|cg|g|kg))( *)$

(\d+) ---> one or more digits
(X) ---> the letter X
(\d+(\.\d+)? (ml|cl|dl|l|mg|cg|g|kg))  is a group containing...
     a number, possible including a decimal, but not necessarily
     a space
     an abbreviation for a unit of measure
( *) ---> zero or more spaces
$ ---> the end of the string

The parentheses are not always needed, but in this case they simplified the effort for both me and Brad, as the parentheses define groups that return "submatches" to the overall pattern, and in our most recent approaches we are both leveraging submatches.
has anyone tried writing something based on the colour characteristic? All his desired split characters are in RED... I don't think a formula can count characters based on colour.

Anthony
>All his desired split characters are in RED... I don't think a formula can count characters based on colour.

I think you'll find that red was only used to help us see what needed splitting.

Patrick
how is that done?  
Anthony,

With respect, this thread is getting long-winded enough as it is.  If karinos57 thinks that reading the font colors is a viable approach to solving the problem, so be it, but like patrickab I suspect the coloring was there solely to draw our attention to the portions of the strings we needed to extract, in which case basing the extraction on the font color is a (forgive the pun) red herring.

In any event, it is a messy operation, and one I would rather avoid, because triggering the calculation is problematic: updating the font color does not trigger a worksheet calculation, so it will be difficult getting the function to recalculate the values when we want it to.

:)

Patrick
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Patricks, thank you both for your feedback, I joined only very recently and it will help me understand, though thread hijack was not my intention.
As it happens I was wondering if whatever method was used, might offer a way forward as I understand there are only 56 colours in the colour page and these are accessible by VBA, but not formulae. However you appear to have ruled it out.
Anthony


Anthony,
I've appended to your comment some code that might be used to return coloured text. You can call it with worksheet formulas like:
=ColouredText(A1)          'Returns any coloured text that is present in cell A1
=ColouredText(A1,3)      'Returns red coloured text that is present in cell A1

As matthewspatrick previously noted, this function won't be triggered by changes to cell formatting. But if you are trying to interpret a static report produced by another organization (which appears to be the case in this thread), then that limitation isn't a problem. That said, I agree with the Patricks that the red color was probably added by the Asker.

 Brad

Function ColouredText(cel As Range, Optional iColourIndex As Integer) As String
'Returns the text that uses a coloured font in a specified cell
Dim i As Long, n As Long
Dim s As String
n = Len(cel.Text)
For i = 1 To n
    If iColourIndex <> 0 Then
        If cel.Characters(i, 1).Font.ColorIndex = iColourIndex Then s = s & cel.Characters(i, 1).Text
    Else
        If cel.Characters(i, 1).Font.ColorIndex <> xlAutomatic Then s = s & cel.Characters(i, 1).Text
    End If
Next
ColouredText = s
End Function

Open in new window

I decided that you might want to return results for the prefilled syringes and vials in the list. You might also want to avoid the use of an array formula (by calling the function twice--once for each value). The code in the snippet below gives both those options.

To provide this flexibility, I revised the function to step through a series of patterns. It will return a value for the first matching pattern.

With the changes, there are only three pharmaceuticals that fail to return results.

Brad
Function PharmaParser(s As String, Optional Position As Integer = 0)
Dim RegEx As Object, oMatches As Object
Dim v As Variant
Set RegEx = CreateObject("VBScript.RegExp")
v = Array("", "")
With RegEx
    .IgnoreCase = True

    .Pattern = "(\d+)(\s*)(x)(\s*)(\d+)(\s*)(ml|mg|g|dl|mcg|cc)"    'Look for 6X250 ML
    If .Test(s) Then
        Set oMatches = .Execute(s)
        v = Array(oMatches(0).Submatches(4) & " " & oMatches(0).Submatches(6), Val(oMatches(0).Submatches(0)))
        GoTo Finish
    End If
    .Pattern = "(\d+)(\s*)(x)(\s*)(\d*\.?\d+)(\s*)(ml|mg|g|dl|mcg|cc)"  'Look for 6X2.5 ML
    If .Test(s) Then
        Set oMatches = .Execute(s)
        v = Array(oMatches(0).Submatches(4) & " " & oMatches(0).Submatches(6), Val(oMatches(0).Submatches(0)))
        GoTo Finish
    End If
    .Pattern = "(\d+)(\s*)(ml|mg|g|dl|mcg|cc)(\s+)(PFS|VL)(\s*)(\d+)"  'Look for 100 MG PFS  10
    If .Test(s) Then
        Set oMatches = .Execute(s)
        v = Array(oMatches(0).Submatches(0) & " " & oMatches(0).Submatches(2) & " " & oMatches(0).Submatches(4), Val(oMatches(0).Submatches(6)))
        GoTo Finish
    End If
    .Pattern = "(\d*\.?\d+)(\s*)(ml|mg|g|dl|mcg|cc)(\s+)(PFS|VL)(\s*)(\d+)"  'Look for 5.5 MG PFS  10
    If .Test(s) Then
        Set oMatches = .Execute(s)
        v = Array(oMatches(0).Submatches(0) & " " & oMatches(0).Submatches(2) & " " & oMatches(0).Submatches(4), Val(oMatches(0).Submatches(6)))
        GoTo Finish
    End If
    .Pattern = "(\d+)(\s*)(ml|mg|g|dl|mcg|cc)(\s+)(PFS|VL)"  'Look for 100 MG VL
    If .Test(s) Then
        Set oMatches = .Execute(s)
        v = Array(oMatches(0).Submatches(0) & " " & oMatches(0).Submatches(2) & " " & oMatches(0).Submatches(4), 1)
        GoTo Finish
    End If
    .Pattern = "(\d*\.?\d+)(\s*)(ml|mg|g|dl|mcg|cc)(\s+)(PFS|VL)"  'Look for 5.5 MG VL
    If .Test(s) Then
        Set oMatches = .Execute(s)
        v = Array(oMatches(0).Submatches(0) & " " & oMatches(0).Submatches(2) & " " & oMatches(0).Submatches(4), 1)
        GoTo Finish
    End If
End With

Finish:
    If Position = 0 Then
        PharmaParser = v
    Else
        PharmaParser = v(Position - 1)
    End If
    Set oMatches = Nothing
    Set RegEx = Nothing
End Function

Open in new window

PharmaParserQ25770462.xls
thank you all who provided the solution.  I wish i had more than 500 pts.
karinos57,

Thanks for the points.

Patrick