Solved

Using regular expressions, am I getting a successful match but no matching string.

Posted on 2014-10-28
9
182 Views
Last Modified: 2014-11-03
I am using the below code to loop through an expression as long as two terms are being multiplied together or divided.
Dim rgx As Regex = Nothing
Dim m As Match
Dim Expression as String="(a)(b)(a)"

Do
     'Code to manipulate Expression.
     'Expression now equals "a<sup>2</sup>b"
     rgx = New Regex("([\-]?[0-9]+(?:\.[0<wbr ></wbr>-9]*)*)?(<<wbr ></wbr>sup>([\-]?<wbr ></wbr>[0-9]*)</s<wbr ></wbr>up>)?(<sup<wbr ></wbr>>E[-|+][0-<wbr ></wbr>9]*</sup>)<wbr ></wbr>?(([\-]?[a<wbr ></wbr>-z](<sup>[<wbr ></wbr>\-]?[0-9]*<wbr ></wbr></sup>)?)*<wbr ></wbr>)(?=[*|/])<wbr ></wbr>")
     m = rgx.Match(Expression)
Loop While m.Success

Open in new window

The resulting expression (a<sup>2</sup>b) is correct and the pattern should no longer match anything in the string.  When I look at the value of m it equals {} but for some reason m.Success=true.

This is causes the Do Loop to continue but the string is no longer changed by the code, causing an infinite loop.

I have tested the pattern and expression string using www.regexr.com/v1 and I do not get a match (as it should be).    I am not new to regular expressions but have never seen this before.

Can someone please explain this and give me some suggestions on how I resolve this issue?
0
Comment
Question by:NevSoFly
  • 5
  • 4
9 Comments
 
LVL 11

Expert Comment

by:louisfr
ID: 40410372
There is a problem with [0<wbr ></wbr>-9]
What do you expect it to match? As it is, that's not a valid character class, since '>' is greater than '9'.
0
 

Author Comment

by:NevSoFly
ID: 40410529
I'm sorry I did a cut and paste.  <wbr></wbr> is not in my original code and I don't know where it came from.  The code should have been as follows.
Dim rgx As Regex = Nothing
Dim m As Match
Dim Expression as String="(a)(b)(a)"

Do
     'Code to manipulate Expression.
     'Expression now equals "a<sup>2</sup>b"
     rgx = New Regex("([\-]?[0-9]+(?:\.[0-9]*)*)?(<sup>([\-]?[0-9]*)</sup>)?(<sup>E[-|+][0-9]*</sup>)?(([\-]?[a-z](<sup>[\-]?[0-9]*</sup>)?)*)(?=[*|/])")
     m = rgx.Match(Expression)
Loop While m.Success

Open in new window

0
 
LVL 11

Expert Comment

by:louisfr
ID: 40411146
Except for (?=[*|/]) each part of your regex is optional.
The regex succeeds with the empty string located between < and /
Check m.Index and m.Length. They should be respectively 8 and 0.
0
 

Author Comment

by:NevSoFly
ID: 40411375
I checked and your right m.index=8 and m.length=0.  The reason every part of my regex is optional is that every part might or might not be there but at least one of them has to be there.

But why is it when I test it in www.regexr.com/v1 I do not get a match at all?

Do you have any suggestions on how I can change the pattern to fit my needs?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:NevSoFly
ID: 40412289
I made some slight alterations to overcome some short cummings that I noticed in my pattern.  Here is my breakdown:

([\-]?[0-9]+(?:\.[0-9]*)*)                                                                                Coefficient
?(<sup>([\-]?[0-9]*)</sup>)                                                                         Exponent
?(<sup>E[-|+][0-9]*</sup>)                                                                        Sci-Notation
?(([\-]?[a-z]((<sup>[\-]?[0-9]*</sup>)|(<sup>E[-|+][0-9]*</sup>))?)*)  Variable w/ Exponent or Sci-Notation.

What I need is a pattern that matches a string (term in this case) that must have either a coefficient or a variable.  

If it has a coefficient then the coefficient may have either an exponent, scientific notation or neither.  

If it has a variable then the variable may have either an exponent, scientific notation or neither.
0
 
LVL 11

Accepted Solution

by:
louisfr earned 500 total points
ID: 40412690
The regexr site forbids regexes which can match 0 characters. The problem is explicitly indicated if you enter the expression on the home page http://www.regexr.com/ and hover over the "Infinite" red button : "The expression can match 0 characters, and therefore matches infinitely".

Exponent OR scientifif notation but not both? You're allowing exponent, followed by sc.not. on the coefficient, but only one of them on the variable.

The coefficient is an optional minus sign, then series of digits and dots? This is allowed: -1...5.23..4

The variable part can be this: a<sup></sup>

Here is a modified version of your regex. I changed the coefficient to be a number with optional decimal part. You can change it back if you want. I match either a mandatory coefficient followed by an optional variable OR a mandatory variable, ensuring that at least one of them matches:
(-?\d+(?:\.\d*)?)(<sup>(-?\d+)</sup>)?(<sup>E[-+]\d+</sup>)?(-?[a-z]((<sup>-?\d+</sup>)|(<sup>E[-+]\d+</sup>)?))?
|
(-?[a-z]((<sup>-?\d+</sup>)|(<sup>E[-+]\d+</sup>)?))
0
 

Author Comment

by:NevSoFly
ID: 40412963
First, thank you very much for your time.   I tested the regex and everything seems to work.  

I do have one problem.  There are some instances where the expression that I am using will have an exponent or scientific notation outside a set of parenthesis. ex. (2a<sup>2</sup>*3b)<sup>4</sup>.  In this case the regex you supplied matches each alphanumeric character separately in <sup>4</sup>.  I would not want any part of <sup>4</sup> to match at all.
0
 
LVL 11

Expert Comment

by:louisfr
ID: 40414117
You can test if the match is being made against a tag, or against something inside a tag with

a look-ahead expression
(add here the matching pattern)(?![^<]*</|[a-z]*>)

Open in new window


or a look-behind expression
(?<!</?[a-z]*|<[a-z]+>[^<]*)(add here the matching pattern)

Open in new window

0
 

Author Comment

by:NevSoFly
ID: 40419783
thx once more.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

A while ago, I was working on a Windows Forms application and I needed a special label control with reflection (glass) effect to show some titles in a stylish way. I've always enjoyed working with graphics, but it's never too clever to re-invent …
Since .Net 2.0, Visual Basic has made it easy to create a splash screen and set it via the "Splash Screen" drop down in the Project Properties.  A splash screen set in this manner is automatically created, displayed and closed by the framework itsel…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

759 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now