Need help with regular expression to match and capture single term expressions inside ()s.

Posted on 2014-01-16
Medium Priority
Last Modified: 2014-01-19
I am using the pattern:
on the below text to match only single terms inside a pair of () and capture it in group 1  unless a superscript number exist on the outside of the closing ) or I have multiple ()s next to each other (ex. (s)(2x)).

Everything seems to work the way I want except two items:
 The terms that are numbers only (4.4, 12 & -33) match but nothing exists in group 1-3 and group 4 & 5 don't match.
 The last expression (s)(2x) matches both terms inside the ()s and I don't want it to match if I have multiple ()s next to each other.

What am I missing?  Is there a way to make this less complicated?
Question by:NevSoFly
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 84

Expert Comment

ID: 39787793
Are you saying that in the case of
you want something to exist in  group 1-3?
do you want group 4 & 5 to match?
I don't get anything in 1-3 for any of them, and $4 matches only <sup>-2</sup> and <sup>-55</sup>

Expert Comment

ID: 39787813
It would help a lot if you told us which language's regex are we talking about. Ordinary egrep doesn't support "?!"
LVL 84

Expert Comment

ID: 39787824
The * at the end of group 2 means match 0 or more times, as many times as possible.
the ? at the end of group 3 means match optionally,
the * at the end of group 4 means match 0 or more times
this means that the empty string matches groups 3 and 4,
So when  group 2 matches as many times as possible, the last possible time will be matching the empty string.
Only this last match will be stored.

Did you mean to say ? instead of * for group 2?
Or did you need the ? on group 3 given the * on group 2?
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.


Author Comment

ID: 39788767
Thanks for the responses.

@Surrano:  I am using VB.net (VS2012).

I am saying that in the cases of 4.4, 12, & -33 I want group 1 to match 4.4, 12, & -33.  I really don't care what any other group matches.  The reason I mentioned the other groups was that I was only trying to provide all the info that I had on my situation.  I'm sorry for the confusion.  

As for the breakdown, of the pattern. I am attempting to breakdown parts of a term.  

([-]?[0-9]*\.?[0-9]*)?     is for coefficients/constants that may be +/-, have decimal points or not be present at.

([a-z]?(<sup>[-]?[0-9]*\.?[0-9]+</sup>)?)*     is for variables that may or may not have exponents or not be present at all.  I believe that I need the ? on group 3 because an exponent could only exists a maximum of 1 times if a variable existed at all.

I am most-likely over complicating this.  

The only reason I added the code to match the constants/coefficients, variables and exponents was that I was trying to differentiate between single and multiple term expressions within the ()s.

I know that the operations inside of the ()s will only be addition, so for group 1 couldn't I just grab everything inside the ()s as long as a + wasn't present?  Then I would only need to ensure that an exponent wasn't out side the closing ).  I was thinking something like \(([^+]+?)\) it seems to work by itself for identifying ()s with only single terms but I can't get it to work with the negative look ahead for exponents.
LVL 84

Expert Comment

ID: 39788800
Please give some examples telling whether or not you want to match, and if it matches, what you would want to capture.

Author Comment

ID: 39789108
I hope this helps.

string                                                                                       capture
(2)                                                                                              2
(2.555)                                                                                       2.555
(2a)                                                                                            2a
(2.555a)                                                                                     2.555a
(2.555a<sup>2</sup>)                                                            2a<sup>2</sup>
(2.555a<sup>2.555</sup>)                                                    2a<sup>2.555</sup>
(a)                                                                                             a
(a<sup>2</sup>)                                                                     a<sup>2</sup>
(ab)                                                                                           ab
(a<sup>2</sup>b)                                                                  a<sup>2</sup>b
(a<sup>2</sup>b<sup>2</sup>)                                               a<sup>2</sup>b<sup>2</sup>
(2)<sup>2</sup>                                                                    nothing
(2.555)<sup>2</sup>                                                             nothing
(2a)<sup>2</sup>                                                                  nothing
(2.555a)<sup>2</sup>                                                           nothing
(2.555a<sup>2</sup>)<sup>2</sup>                                  nothing
(2.555a<sup>2.555</sup>)<sup>2</sup>                           nothing
(a)<sup>2</sup>                                                                    nothing
(a<sup>2</sup>)<sup>2</sup>                                           nothing
(ab)<sup>2</sup>                                                                 nothing
(a<sup>2</sup>b)<sup>2</sup>                                         nothing
(a<sup>2</sup>b<sup>2</sup>)<sup>2</sup>                 nothing
(any expression)(any expression)                                         nothing

Expert Comment

by:Derek Jensen
ID: 39789974
I am having no luck getting every single row to match using your test data with only one regex; I think you'll have to simply use multiple regexes, and either check each one on each line in a loop, or use an array of regexes if you're in PHP.
LVL 84

Expert Comment

ID: 39790357
(2.555a<sup>2</sup>)                                                            2a<sup>2</sup>
How does 2a<sup>2</sup> come from (2.555a<sup>2</sup>)   ?
Are we to ignore \.\d+ in the case when it is followed by a<sup>?
What if it is followed by <sup> with no a?
Do we only take the first and last character of whatever precedes <sup>?
LVL 84

Accepted Solution

ozo earned 2000 total points
ID: 39790362
Assuming  2a<sup> was supposed to be 2.555a<sup>, this works:
  print $1 if /^\(([^)]+)\)(?![<(])/;
Otherwise, I'll need more examples to determine exactly what is to be captured.

Author Comment

ID: 39793126
(2.555a<sup>2</sup>)                                                            2a<sup>2</sup>
should have been
(2.555a<sup>2</sup>)                                                            2.555a<sup>2</sup>

Are we to ignore \.\d+ in the case when it is followed by a<sup>?

I'm guessing \.\d is from your code, so if your asking if your to ignore a decimal point and the following numbers if an exponent follows it. (ex. (2.555a<sup>2</sup>) ) the answer is no.

What if it is followed by <sup> with no a? no.

Do we only take the first and last character of whatever precedes <sup>?  no, if <sup> is within the ()s take everything.  If <sup> is outside the ()s take nothing.
LVL 84

Expert Comment

ID: 39793134
If (2.555a<sup>2</sup>) should have been 2.555a<sup>2</sup>
then /^\(([^)]+)\)(?![<(])/ seems to do everything you want on the examples in http:#a39789108

Author Closing Comment

ID: 39793138
It answer all the examples that I gave but could you please break it down and explain it to me because all I understand is the negative look ahead part.
LVL 84

Expert Comment

ID: 39793146
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/^\(([^)]+)\)(?![<(])/)->explain'
The regular expression:


matches as follows:
NODE                     EXPLANATION
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
  ^                        the beginning of the string
  \(                       '('
  (                        group and capture to \1:
    [^)]+                    any character except: ')' (1 or more
                             times (matching the most amount
  )                        end of \1
  \)                       ')'
  (?!                      look ahead to see if there is not:
    [<(]                     any character of: '<', '('
  )                        end of look-ahead
)                        end of grouping

So, everything in a set of parentheses at the start if the string, unless that set of parentheses is followed by < or (

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

by Batuhan Cetin Regular expression is a language that we use to edit a string or retrieve sub-strings that meets specific rules from a text. A regular expression can be applied to a set of string variables. There are many RegEx engines for u…
I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses
Course of the Month14 days, 16 hours left to enroll

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question