Solved

Using regular expressions, am I getting a successful match but no matching string.

Posted on 2014-10-28
9
184 Views
Last Modified: 2014-11-03
I am using the below code to loop through an expression as long as two terms are being multiplied together or divided.
Dim rgx As Regex = Nothing
Dim m As Match
Dim Expression as String="(a)(b)(a)"

Do
     'Code to manipulate Expression.
     'Expression now equals "a<sup>2</sup>b"
     rgx = New Regex("([\-]?[0-9]+(?:\.[0<wbr ></wbr>-9]*)*)?(<<wbr ></wbr>sup>([\-]?<wbr ></wbr>[0-9]*)</s<wbr ></wbr>up>)?(<sup<wbr ></wbr>>E[-|+][0-<wbr ></wbr>9]*</sup>)<wbr ></wbr>?(([\-]?[a<wbr ></wbr>-z](<sup>[<wbr ></wbr>\-]?[0-9]*<wbr ></wbr></sup>)?)*<wbr ></wbr>)(?=[*|/])<wbr ></wbr>")
     m = rgx.Match(Expression)
Loop While m.Success

Open in new window

The resulting expression (a<sup>2</sup>b) is correct and the pattern should no longer match anything in the string.  When I look at the value of m it equals {} but for some reason m.Success=true.

This is causes the Do Loop to continue but the string is no longer changed by the code, causing an infinite loop.

I have tested the pattern and expression string using www.regexr.com/v1 and I do not get a match (as it should be).    I am not new to regular expressions but have never seen this before.

Can someone please explain this and give me some suggestions on how I resolve this issue?
0
Comment
Question by:NevSoFly
  • 5
  • 4
9 Comments
 
LVL 11

Expert Comment

by:louisfr
ID: 40410372
There is a problem with [0<wbr ></wbr>-9]
What do you expect it to match? As it is, that's not a valid character class, since '>' is greater than '9'.
0
 

Author Comment

by:NevSoFly
ID: 40410529
I'm sorry I did a cut and paste.  <wbr></wbr> is not in my original code and I don't know where it came from.  The code should have been as follows.
Dim rgx As Regex = Nothing
Dim m As Match
Dim Expression as String="(a)(b)(a)"

Do
     'Code to manipulate Expression.
     'Expression now equals "a<sup>2</sup>b"
     rgx = New Regex("([\-]?[0-9]+(?:\.[0-9]*)*)?(<sup>([\-]?[0-9]*)</sup>)?(<sup>E[-|+][0-9]*</sup>)?(([\-]?[a-z](<sup>[\-]?[0-9]*</sup>)?)*)(?=[*|/])")
     m = rgx.Match(Expression)
Loop While m.Success

Open in new window

0
 
LVL 11

Expert Comment

by:louisfr
ID: 40411146
Except for (?=[*|/]) each part of your regex is optional.
The regex succeeds with the empty string located between < and /
Check m.Index and m.Length. They should be respectively 8 and 0.
0
 

Author Comment

by:NevSoFly
ID: 40411375
I checked and your right m.index=8 and m.length=0.  The reason every part of my regex is optional is that every part might or might not be there but at least one of them has to be there.

But why is it when I test it in www.regexr.com/v1 I do not get a match at all?

Do you have any suggestions on how I can change the pattern to fit my needs?
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:NevSoFly
ID: 40412289
I made some slight alterations to overcome some short cummings that I noticed in my pattern.  Here is my breakdown:

([\-]?[0-9]+(?:\.[0-9]*)*)                                                                                Coefficient
?(<sup>([\-]?[0-9]*)</sup>)                                                                         Exponent
?(<sup>E[-|+][0-9]*</sup>)                                                                        Sci-Notation
?(([\-]?[a-z]((<sup>[\-]?[0-9]*</sup>)|(<sup>E[-|+][0-9]*</sup>))?)*)  Variable w/ Exponent or Sci-Notation.

What I need is a pattern that matches a string (term in this case) that must have either a coefficient or a variable.  

If it has a coefficient then the coefficient may have either an exponent, scientific notation or neither.  

If it has a variable then the variable may have either an exponent, scientific notation or neither.
0
 
LVL 11

Accepted Solution

by:
louisfr earned 500 total points
ID: 40412690
The regexr site forbids regexes which can match 0 characters. The problem is explicitly indicated if you enter the expression on the home page http://www.regexr.com/ and hover over the "Infinite" red button : "The expression can match 0 characters, and therefore matches infinitely".

Exponent OR scientifif notation but not both? You're allowing exponent, followed by sc.not. on the coefficient, but only one of them on the variable.

The coefficient is an optional minus sign, then series of digits and dots? This is allowed: -1...5.23..4

The variable part can be this: a<sup></sup>

Here is a modified version of your regex. I changed the coefficient to be a number with optional decimal part. You can change it back if you want. I match either a mandatory coefficient followed by an optional variable OR a mandatory variable, ensuring that at least one of them matches:
(-?\d+(?:\.\d*)?)(<sup>(-?\d+)</sup>)?(<sup>E[-+]\d+</sup>)?(-?[a-z]((<sup>-?\d+</sup>)|(<sup>E[-+]\d+</sup>)?))?
|
(-?[a-z]((<sup>-?\d+</sup>)|(<sup>E[-+]\d+</sup>)?))
0
 

Author Comment

by:NevSoFly
ID: 40412963
First, thank you very much for your time.   I tested the regex and everything seems to work.  

I do have one problem.  There are some instances where the expression that I am using will have an exponent or scientific notation outside a set of parenthesis. ex. (2a<sup>2</sup>*3b)<sup>4</sup>.  In this case the regex you supplied matches each alphanumeric character separately in <sup>4</sup>.  I would not want any part of <sup>4</sup> to match at all.
0
 
LVL 11

Expert Comment

by:louisfr
ID: 40414117
You can test if the match is being made against a tag, or against something inside a tag with

a look-ahead expression
(add here the matching pattern)(?![^<]*</|[a-z]*>)

Open in new window


or a look-behind expression
(?<!</?[a-z]*|<[a-z]+>[^<]*)(add here the matching pattern)

Open in new window

0
 

Author Comment

by:NevSoFly
ID: 40419783
thx once more.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I think the Typed DataTable and Typed DataSet are very good options when working with data, but I don't like auto-generated code. First, I create an Abstract Class for my DataTables Common Code.  This class Inherits from DataTable. Also, it can …
Microsoft Reports are based on a report definition, which is an XML file that describes data and layout for the report, with a different extension. You can create a client-side report definition language (*.rdlc) file with Visual Studio, and build g…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now