Solved

Need help with regular expression to match and capture single term expressions inside ()s.

Posted on 2014-01-16
13
292 Views
Last Modified: 2014-01-19
I am using the pattern:
\((([-]?[0-9]*\.?[0-9]*)?([a-z]?(<sup>[-]?[0-9]*\.?[0-9]+</sup>)?)*)*\)(?!(<sup>[-]?[0-9]*\.?[0-9]*))
on the below text to match only single terms inside a pair of () and capture it in group 1  unless a superscript number exist on the outside of the closing ) or I have multiple ()s next to each other (ex. (s)(2x)).
(3d)/5
(3d)*d
a+(3d)+2
(3d)-o
(3.5d)
(4.4)
7x-(3d)/5
(3d)*d
(12)
(a<sup>2</sup>b<sup>-2</sup>)
(xy)
(-33)
9+2
(3w<sup>-55</sup>)
6*(3555d)+2
a/(13d)-o
(s)<sup>3</sup>
7x-(3d)
(3d)*d
(2ab+3an)
6*(3d)+2
a/(3d)-o
(2w)=
=(2w)
(s)
(s)(2x)

Everything seems to work the way I want except two items:
 The terms that are numbers only (4.4, 12 & -33) match but nothing exists in group 1-3 and group 4 & 5 don't match.
 The last expression (s)(2x) matches both terms inside the ()s and I don't want it to match if I have multiple ()s next to each other.

What am I missing?  Is there a way to make this less complicated?
0
Comment
Question by:NevSoFly
13 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 39787793
Are you saying that in the case of
(4.4)
(12)
(-33)
you want something to exist in  group 1-3?
do you want group 4 & 5 to match?
I don't get anything in 1-3 for any of them, and $4 matches only <sup>-2</sup> and <sup>-55</sup>
0
 
LVL 8

Expert Comment

by:Surrano
ID: 39787813
It would help a lot if you told us which language's regex are we talking about. Ordinary egrep doesn't support "?!"
0
 
LVL 84

Expert Comment

by:ozo
ID: 39787824
The * at the end of group 2 means match 0 or more times, as many times as possible.
the ? at the end of group 3 means match optionally,
the * at the end of group 4 means match 0 or more times
this means that the empty string matches groups 3 and 4,
So when  group 2 matches as many times as possible, the last possible time will be matching the empty string.
Only this last match will be stored.

Did you mean to say ? instead of * for group 2?
Or did you need the ? on group 3 given the * on group 2?
0
 

Author Comment

by:NevSoFly
ID: 39788767
Thanks for the responses.

@Surrano:  I am using VB.net (VS2012).

@ozo:
I am saying that in the cases of 4.4, 12, & -33 I want group 1 to match 4.4, 12, & -33.  I really don't care what any other group matches.  The reason I mentioned the other groups was that I was only trying to provide all the info that I had on my situation.  I'm sorry for the confusion.  

As for the breakdown, of the pattern. I am attempting to breakdown parts of a term.  

([-]?[0-9]*\.?[0-9]*)?     is for coefficients/constants that may be +/-, have decimal points or not be present at.

([a-z]?(<sup>[-]?[0-9]*\.?[0-9]+</sup>)?)*     is for variables that may or may not have exponents or not be present at all.  I believe that I need the ? on group 3 because an exponent could only exists a maximum of 1 times if a variable existed at all.

I am most-likely over complicating this.  

The only reason I added the code to match the constants/coefficients, variables and exponents was that I was trying to differentiate between single and multiple term expressions within the ()s.

I know that the operations inside of the ()s will only be addition, so for group 1 couldn't I just grab everything inside the ()s as long as a + wasn't present?  Then I would only need to ensure that an exponent wasn't out side the closing ).  I was thinking something like \(([^+]+?)\) it seems to work by itself for identifying ()s with only single terms but I can't get it to work with the negative look ahead for exponents.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39788800
Please give some examples telling whether or not you want to match, and if it matches, what you would want to capture.
0
 

Author Comment

by:NevSoFly
ID: 39789108
I hope this helps.

string                                                                                       capture
(2)                                                                                              2
(2.555)                                                                                       2.555
(2a)                                                                                            2a
(2.555a)                                                                                     2.555a
(2.555a<sup>2</sup>)                                                            2a<sup>2</sup>
(2.555a<sup>2.555</sup>)                                                    2a<sup>2.555</sup>
(a)                                                                                             a
(a<sup>2</sup>)                                                                     a<sup>2</sup>
(ab)                                                                                           ab
(a<sup>2</sup>b)                                                                  a<sup>2</sup>b
(a<sup>2</sup>b<sup>2</sup>)                                               a<sup>2</sup>b<sup>2</sup>
(2)<sup>2</sup>                                                                    nothing
(2.555)<sup>2</sup>                                                             nothing
(2a)<sup>2</sup>                                                                  nothing
(2.555a)<sup>2</sup>                                                           nothing
(2.555a<sup>2</sup>)<sup>2</sup>                                  nothing
(2.555a<sup>2.555</sup>)<sup>2</sup>                           nothing
(a)<sup>2</sup>                                                                    nothing
(a<sup>2</sup>)<sup>2</sup>                                           nothing
(ab)<sup>2</sup>                                                                 nothing
(a<sup>2</sup>b)<sup>2</sup>                                         nothing
(a<sup>2</sup>b<sup>2</sup>)<sup>2</sup>                 nothing
(any expression)(any expression)                                         nothing
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 9

Expert Comment

by:Derek Jensen
ID: 39789974
I am having no luck getting every single row to match using your test data with only one regex; I think you'll have to simply use multiple regexes, and either check each one on each line in a loop, or use an array of regexes if you're in PHP.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39790357
(2.555a<sup>2</sup>)                                                            2a<sup>2</sup>
How does 2a<sup>2</sup> come from (2.555a<sup>2</sup>)   ?
Are we to ignore \.\d+ in the case when it is followed by a<sup>?
What if it is followed by <sup> with no a?
Do we only take the first and last character of whatever precedes <sup>?
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 39790362
Assuming  2a<sup> was supposed to be 2.555a<sup>, this works:
  print $1 if /^\(([^)]+)\)(?![<(])/;
Otherwise, I'll need more examples to determine exactly what is to be captured.
0
 

Author Comment

by:NevSoFly
ID: 39793126
sorry,
(2.555a<sup>2</sup>)                                                            2a<sup>2</sup>
should have been
(2.555a<sup>2</sup>)                                                            2.555a<sup>2</sup>

Are we to ignore \.\d+ in the case when it is followed by a<sup>?

I'm guessing \.\d is from your code, so if your asking if your to ignore a decimal point and the following numbers if an exponent follows it. (ex. (2.555a<sup>2</sup>) ) the answer is no.

What if it is followed by <sup> with no a? no.

Do we only take the first and last character of whatever precedes <sup>?  no, if <sup> is within the ()s take everything.  If <sup> is outside the ()s take nothing.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39793134
If (2.555a<sup>2</sup>) should have been 2.555a<sup>2</sup>
then /^\(([^)]+)\)(?![<(])/ seems to do everything you want on the examples in http:#a39789108
0
 

Author Closing Comment

by:NevSoFly
ID: 39793138
It answer all the examples that I gave but could you please break it down and explain it to me because all I understand is the negative look ahead part.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39793146
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/^\(([^)]+)\)(?![<(])/)->explain'
The regular expression:

(?-imsx:^\(([^)]+)\)(?![<(]))

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  \(                       '('
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [^)]+                    any character except: ')' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \)                       ')'
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    [<(]                     any character of: '<', '('
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

So, everything in a set of parentheses at the start if the string, unless that set of parentheses is followed by < or (
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now