Link to home
Start Free TrialLog in
Avatar of dsimcox
dsimcox

asked on

Regular Expression to extract decimal number from text string

I'm having trouble getting the right regular expression to use in an .Execute or .Replace RegExp Method to extract a decimal number.

Here's what I've come up with that does not work:
Public Function MyNumber(txt)
Dim RegEx As Object

Set RegEx = CreateObject("vbscript.regexp")
   
With RegEx
    .Global = True
    .Pattern = "[^\d]+[^\.]?[^\d]+"
End With

MyNumber = RegEx.Replace(txt, "")

Set RegEx = Nothing

End Function

Open in new window


I want to extract the decimal number from a string like this:  "$13.4M"  or "74.5 FTE"

My pattern is: "[^\d]+[^\.]?[^\d]+"

[^\d]+   -  Not a digit (multiple instances)
[^\.]?    -  Not a decimal (only one instance)
[^\d]+   -  Not a digit (multiple instances)

This function is returning the entire string when I test it. What am I doing wrong?


ASKER CERTIFIED SOLUTION
Avatar of Patrick Matthews
Patrick Matthews
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Or a little differently..."\d(\.\d+)?"
Forgot to mention, the patterns above are meant to use with RegExp.Execute. If you simply want to extract a match, Execute is usually better than Replace.
For your pattern to match, you are currently requiring:
[^\d]+    One or more non-digit characters
[^\.]?     (an optional non-decimal-point character)
[^\d]+    And another one (or more) non-digit characters

These must be all next to each other for the pattern to match, so it would only work where you have at least 2 non-digit characters in a row. To keep using RegEx.Replace you'll need to use something like:

.Pattern = "^\D*(\d+(?:\.\d+)?)\D*$"
MyNumber = RegEx.Replace(txt, "$1")

Avatar of dsimcox
dsimcox

ASKER

matthewspatric, thanks for your suggestions.
I used this code successfully:

=================================
Public Function MyNumber(txt)
Dim RegEx As Object, myMatch As Variant

Set RegEx = CreateObject("vbscript.regexp")

If RegEx.Test(txt) = True Then
    With RegEx
        .IgnoreCase = True
        .MultiLine = False
        .Global = True
        .Pattern = "\d+\.?\d*"
    End With
   
Set myMatch = RegEx.Execute(txt)
    MyNumber = myMatch(0)
End If
==================================


terryatopus, I was unable to get your suggestion to work.


.Pattern = "^\D*(\d+(?:\.\d+)?)\D*$"
MyNumber = RegEx.Replace(txt, "$1")

Using the Replace method, it seems like this pattern would exclude the numeric characters and decimal:  (\d+(?:\.\d+). I must be missing something here, because I don't doubt your logic is right-on!