Link to home
Start Free TrialLog in
Avatar of John Carney
John CarneyFlag for United States of America

asked on

Adding exceptions to the pattern definitions in a RegEx code

I have this great code that you guys have been "helping" me put together (yes, I have actually made a few successful tweaks to it!), and now I need to add some exceptions to the pattern definitions.

Occasionally the targeted free text will have a date like "03-March-2011" or "3 July 2010" and it will produce "03-Ma" or "3 J" in the calling cell. I would like to add something that prevents that. I have a named range "months" which contains all the months abbreviated and spelled in full.

Can I add something to the pattern definition that says in effect: "If the text has a number which otherwise satsfies the pattern criteria BUT is followed by one of the strings in 'months', then don't include it."

Thanks,
John
Function GetSeats(InputCell As Range) As String  
    Dim RegEx, RegM, RegMC
    Dim MyDic
    Dim tmpStr As String
    Dim i As Long
 
    Set MyDic = CreateObject("scripting.dictionary")
    Set RegEx = CreateObject("vbscript.regexp")
    With RegEx
        '.Pattern = "ROW(S)? (\d+)-(\d+)"
        '.Pattern = "ROW(S)?\s*(\d+)\s*[,-/]\s*?(\d+)" 'Brad
        '.Pattern = "ROW(S)?\s*(\d+)\s*([,-/|]|to|thru|through)\s*(\d+)"  
        .Pattern = "(?:ROW|SEAT)(S)?\s*(\d+)\s*([,-/|]|to|thru|through)\s*(\d+)"  

        .Global = True
        .IgnoreCase = True
        If .test(InputCell.Value) Then
            Set RegMC = .Execute(InputCell)
            For Each RegM In RegMC
                If MyDic.exists(LCase$(RegM)) = False Then
                    For i = RegM.submatches(1) To RegM.submatches(3)
                        Select Case i
                        Case Is <= 4
                            tmpStr = tmpStr & i & "ACDF, "
                        Case Is < 7 > 4
                            tmpStr = tmpStr & i & "ABCDEF, "
                        Case Else
                            tmpStr = tmpStr & i & "ABCDEFHJK, "
                        End Select
                    Next
                    tmpStr = Left$(tmpStr, Len(tmpStr) - 2)
                    MyDic.Add LCase$(RegM), 1
                End If
            Next
        End If
        '.Pattern = "\d+[A-M]+"
        .Pattern = "\d+[ |/|-]?[A-M]+" 'Allow up to one space
        '.Pattern = "\d+ *[A-M]+" 'Allow any number of spaces 
        If .test(InputCell.Value) Then
            Set RegMC = .Execute(InputCell)
            For Each RegM In RegMC
                If tmpStr = vbNullString Then
                    tmpStr = RegM
                Else
                    tmpStr = tmpStr & ", " & RegM
                End If
            Next
        End If
        GetSeats = tmpStr
    End With
End Function

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Terry Woods
Terry Woods
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of John Carney

ASKER

Boy, you really know this stuff, thanks! Now Iam I right that you can't use a range reference to define your exclusions? The reason I ask is that the list might grow significantly over time and it would be nice to have something more compact if we end up with 40 or 50 exclusions. If a range refernce is possible please let me know.

Thanks Terry,

John

Terry, one more thing, if you're still there: How do I limit the number matches to 3 or fewer digits?
You can't just say (?!jan-dec), but there are ways to reduce the length of patterns, like this:

.Pattern = "\d+[ |/|-]?(?!jan|feb|apr|ma[ry]|ju[ln]|aug|sep|oct|nov|dec)[A-M]+"

What kind of other exclusions might you want? If you can provide examples I might be able to suggest a suitable pattern.

And thanks for the points, by the way!
at least 1 but no more than 3 digits:

.Pattern = "\d{1,3}[ |/|-]?(?!jan|feb|apr|ma[ry]|ju[ln]|aug|sep|oct|nov|dec)[A-M]+"
By the way, this:
[ |/|-]

Should probably be:
[ /-]
because you don't need | characters to indicate a logical OR within [] brackets. | is only used as a logical OR with round brackets () or without brackets at all eg
match this and (this|or this)
match this|or this
Note also that with pattern:
"\d{1,3}[ /-]?(?!jan|feb|apr|ma[ry]|ju[ln]|aug|sep|oct|nov|dec)[A-M]+"
the following case will match:
1234-J
because the pattern matches 234-J

If you want to avoid that, use something like:
"^\d{1,3}[ |/|-]?(?!jan|feb|apr|ma[ry]|ju[ln]|aug|sep|oct|nov|dec)[A-M]+" '^ matches the start of the string
or
"(?<=\D)\d{1,3}[ |/|-]?(?!jan|feb|apr|ma[ry]|ju[ln]|aug|sep|oct|nov|dec)[A-M]+" 'requires a non-digit before the digits.
One more note!:

There is an active "Regular Expressions" zone in EE in case you post any further questions relating to that topic.
Your last suggestion looked like it might do that, but it bugged out for me.