Link to home
Start Free TrialLog in
Avatar of Robert Berke
Robert BerkeFlag for United States of America

asked on

how to improve this vba regex pattern "(?:\D|\b)(\d{9})(?=\D|\b)"

This is nit picky type question, but I hope to learn something new about non-capturing patterns.

The following code extracts only 9 digit integers from a string.
An earlier version of the code had a simpler line 13:
     alternative:               msgbox onum

That version displayed e123456789 for the second number, because the non capturing portion of the pattern was still included as part of the match.    I wonder if there is some way to change the pattern to accomplish the same thing  so I could use the simpler alternative?  Something like a look behind?

Sub deletenow()
Dim regex, match, onum

    Set regex = CreateObject("vbscript.regexp")

    regex.Global = True
    '                                  / positive lookahead
    '                / non capturing  /
    regex.pattern = "(?:\D|\b)(\d{9})(?=\D|\b)"
    
    Set match = regex.Execute("1234 123456789 quote123456789 0123456789")
    For Each onum In match
        MsgBox onum.submatches(0)
    Next

End Sub

Open in new window

Avatar of HainKurt
HainKurt
Flag of Canada image

just use "(\d{9})" for 9 digit pattern
or if it should start with e + 9 digits, use "(e\d{9})"
Avatar of Robert Berke

ASKER

No that treats 123456789123456789  as  if there were two 9 digit numbers.  
My goal is to treat every non digit as if it were a blank, then extract integers that are exactly 9 digits.

    regex.pattern = "(\d{9})"
    Set match = regex.Execute("234567890 quote123456789 0123456789123456789")
    For Each onum In match
        MsgBox onum
    Next
Avatar of Rgonzo1971
Rgonzo1971

Hi,

VBScript regex does not Support Lookbehind afaik
pls try
Sub deletenow()
Dim regex, match, onum

    Set regex = CreateObject("vbscript.regexp")

    regex.Global = True
    '                                  / positive lookahead
    '                / non capturing  /
    regex.Pattern = "(?!\D)(\d{9})(?=\D)"
    
    Set match = regex.Execute("1234 123456789 quote123456789 0123456789")
    For Each onum In match
        MsgBox onum
    Next

End Sub

Open in new window


you could also use "(?=\d)(\d{9})(?=\D)" but on your example it would use more steps

Regards
what is expected result from

234567890 quote123456789 0123456789123456789

"(\d{9})" gives

234567890
123456789
012345678
912345678
As usual regular expressions will bust the chops for the simplest requests.  I want all 9 digit integers period and not larger numbers.
But, as usual, regular expressions bust the chops of everybody who thinks they understand them.

So far we are zero for three on our attempts at a solution.  The solution in my first post works, but even that took 3 tries, and I am looking for something simpler.

Here is my latest test code. Please run it to avoid posting non-working solutions.

Option Explicit

Sub deletenow()
Dim regex, match, onum, result

    Set regex = CreateObject("vbscript.regexp")
    regex.Global = True
    regex.Pattern = "(?=\d)(\d{9})(?=\D)"      ' and "(?!\D)(\d{9})(?=\D)" also fails
    result = ""
    Set match = regex.Execute("1234 123456789 01234567891234567890 quote123456789")
    For Each onum In match
        result = result & onum & ";"
    Next
    MsgBox result & "    I want result to be 123456789;123456789;"
End Sub
try this

(^(\d{9})\D*)|(\D(\d{9})\D*)

Open in new window

what is (?!  
O'Reilly's "Mastering Regular Expressions" says it is "fail now" but I don't understand  what that means.
hainkurt.  try it yourself, it doesn't work
SOLUTION
Avatar of Rgonzo1971
Rgonzo1971

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
what is ?!  

? : 0 or one
! : not

?! : not 0 or one, ie multiple only...

I guess...
Yes, I know my original code works, but I like learning things.  Perhaps you can give me an example where (?!  is useful
Dope slap to the forehead.  ?!  is just normal   negation.  I was looking at it wrong and thought it like (?:  which means noncapturuing.
I do not need any example as I understand normal negation pretty will
example
with text irongate ironman

the expression iron(?!man) only selects
irongate ironman
So, I am back to square one. Perhaps my first solution is the only solution, but I will leave the question open for a while and see if anybody else has a better idea.
conversely the expression iron(?=man) only selects
irongate ironman
?! is negative lookahead
?=  is positive lookahead
ok, I guess your issue is how to get the value not the regex...

check this code

    regex.Pattern = "(^(\d{9})\D*)|(\D(\d{9})\D*)"
    Set matches = regex.Execute("1234 123456789 quote123456789 0123456789")
    For Each match In matches
      last = match.SubMatches.count
      MsgBox match.SubMatches(last-1)
    Next

Open in new window

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
rgonzo1971 is the only expert who actually said that my original post is the best solution.  

All the other solutions posed were inadequate, so I am giving rgonzo1971 the points, and awarding myself the best answer.