Solved

Need help with regular expression to match and capture single term expressions inside ()s.

Posted on 2014-01-16
13
299 Views
Last Modified: 2014-01-19
I am using the pattern:
\((([-]?[0-9]*\.?[0-9]*)?([a-z]?(<sup>[-]?[0-9]*\.?[0-9]+</sup>)?)*)*\)(?!(<sup>[-]?[0-9]*\.?[0-9]*))
on the below text to match only single terms inside a pair of () and capture it in group 1  unless a superscript number exist on the outside of the closing ) or I have multiple ()s next to each other (ex. (s)(2x)).
(3d)/5
(3d)*d
a+(3d)+2
(3d)-o
(3.5d)
(4.4)
7x-(3d)/5
(3d)*d
(12)
(a<sup>2</sup>b<sup>-2</sup>)
(xy)
(-33)
9+2
(3w<sup>-55</sup>)
6*(3555d)+2
a/(13d)-o
(s)<sup>3</sup>
7x-(3d)
(3d)*d
(2ab+3an)
6*(3d)+2
a/(3d)-o
(2w)=
=(2w)
(s)
(s)(2x)

Everything seems to work the way I want except two items:
 The terms that are numbers only (4.4, 12 & -33) match but nothing exists in group 1-3 and group 4 & 5 don't match.
 The last expression (s)(2x) matches both terms inside the ()s and I don't want it to match if I have multiple ()s next to each other.

What am I missing?  Is there a way to make this less complicated?
0
Comment
Question by:NevSoFly
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
13 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 39787793
Are you saying that in the case of
(4.4)
(12)
(-33)
you want something to exist in  group 1-3?
do you want group 4 & 5 to match?
I don't get anything in 1-3 for any of them, and $4 matches only <sup>-2</sup> and <sup>-55</sup>
0
 
LVL 8

Expert Comment

by:Surrano
ID: 39787813
It would help a lot if you told us which language's regex are we talking about. Ordinary egrep doesn't support "?!"
0
 
LVL 84

Expert Comment

by:ozo
ID: 39787824
The * at the end of group 2 means match 0 or more times, as many times as possible.
the ? at the end of group 3 means match optionally,
the * at the end of group 4 means match 0 or more times
this means that the empty string matches groups 3 and 4,
So when  group 2 matches as many times as possible, the last possible time will be matching the empty string.
Only this last match will be stored.

Did you mean to say ? instead of * for group 2?
Or did you need the ? on group 3 given the * on group 2?
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:NevSoFly
ID: 39788767
Thanks for the responses.

@Surrano:  I am using VB.net (VS2012).

@ozo:
I am saying that in the cases of 4.4, 12, & -33 I want group 1 to match 4.4, 12, & -33.  I really don't care what any other group matches.  The reason I mentioned the other groups was that I was only trying to provide all the info that I had on my situation.  I'm sorry for the confusion.  

As for the breakdown, of the pattern. I am attempting to breakdown parts of a term.  

([-]?[0-9]*\.?[0-9]*)?     is for coefficients/constants that may be +/-, have decimal points or not be present at.

([a-z]?(<sup>[-]?[0-9]*\.?[0-9]+</sup>)?)*     is for variables that may or may not have exponents or not be present at all.  I believe that I need the ? on group 3 because an exponent could only exists a maximum of 1 times if a variable existed at all.

I am most-likely over complicating this.  

The only reason I added the code to match the constants/coefficients, variables and exponents was that I was trying to differentiate between single and multiple term expressions within the ()s.

I know that the operations inside of the ()s will only be addition, so for group 1 couldn't I just grab everything inside the ()s as long as a + wasn't present?  Then I would only need to ensure that an exponent wasn't out side the closing ).  I was thinking something like \(([^+]+?)\) it seems to work by itself for identifying ()s with only single terms but I can't get it to work with the negative look ahead for exponents.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39788800
Please give some examples telling whether or not you want to match, and if it matches, what you would want to capture.
0
 

Author Comment

by:NevSoFly
ID: 39789108
I hope this helps.

string                                                                                       capture
(2)                                                                                              2
(2.555)                                                                                       2.555
(2a)                                                                                            2a
(2.555a)                                                                                     2.555a
(2.555a<sup>2</sup>)                                                            2a<sup>2</sup>
(2.555a<sup>2.555</sup>)                                                    2a<sup>2.555</sup>
(a)                                                                                             a
(a<sup>2</sup>)                                                                     a<sup>2</sup>
(ab)                                                                                           ab
(a<sup>2</sup>b)                                                                  a<sup>2</sup>b
(a<sup>2</sup>b<sup>2</sup>)                                               a<sup>2</sup>b<sup>2</sup>
(2)<sup>2</sup>                                                                    nothing
(2.555)<sup>2</sup>                                                             nothing
(2a)<sup>2</sup>                                                                  nothing
(2.555a)<sup>2</sup>                                                           nothing
(2.555a<sup>2</sup>)<sup>2</sup>                                  nothing
(2.555a<sup>2.555</sup>)<sup>2</sup>                           nothing
(a)<sup>2</sup>                                                                    nothing
(a<sup>2</sup>)<sup>2</sup>                                           nothing
(ab)<sup>2</sup>                                                                 nothing
(a<sup>2</sup>b)<sup>2</sup>                                         nothing
(a<sup>2</sup>b<sup>2</sup>)<sup>2</sup>                 nothing
(any expression)(any expression)                                         nothing
0
 
LVL 9

Expert Comment

by:Derek Jensen
ID: 39789974
I am having no luck getting every single row to match using your test data with only one regex; I think you'll have to simply use multiple regexes, and either check each one on each line in a loop, or use an array of regexes if you're in PHP.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39790357
(2.555a<sup>2</sup>)                                                            2a<sup>2</sup>
How does 2a<sup>2</sup> come from (2.555a<sup>2</sup>)   ?
Are we to ignore \.\d+ in the case when it is followed by a<sup>?
What if it is followed by <sup> with no a?
Do we only take the first and last character of whatever precedes <sup>?
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 39790362
Assuming  2a<sup> was supposed to be 2.555a<sup>, this works:
  print $1 if /^\(([^)]+)\)(?![<(])/;
Otherwise, I'll need more examples to determine exactly what is to be captured.
0
 

Author Comment

by:NevSoFly
ID: 39793126
sorry,
(2.555a<sup>2</sup>)                                                            2a<sup>2</sup>
should have been
(2.555a<sup>2</sup>)                                                            2.555a<sup>2</sup>

Are we to ignore \.\d+ in the case when it is followed by a<sup>?

I'm guessing \.\d is from your code, so if your asking if your to ignore a decimal point and the following numbers if an exponent follows it. (ex. (2.555a<sup>2</sup>) ) the answer is no.

What if it is followed by <sup> with no a? no.

Do we only take the first and last character of whatever precedes <sup>?  no, if <sup> is within the ()s take everything.  If <sup> is outside the ()s take nothing.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39793134
If (2.555a<sup>2</sup>) should have been 2.555a<sup>2</sup>
then /^\(([^)]+)\)(?![<(])/ seems to do everything you want on the examples in http:#a39789108
0
 

Author Closing Comment

by:NevSoFly
ID: 39793138
It answer all the examples that I gave but could you please break it down and explain it to me because all I understand is the negative look ahead part.
0
 
LVL 84

Expert Comment

by:ozo
ID: 39793146
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/^\(([^)]+)\)(?![<(])/)->explain'
The regular expression:

(?-imsx:^\(([^)]+)\)(?![<(]))

matches as follows:
 
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  \(                       '('
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [^)]+                    any character except: ')' (1 or more
                             times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  \)                       ')'
----------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
----------------------------------------------------------------------
    [<(]                     any character of: '<', '('
----------------------------------------------------------------------
  )                        end of look-ahead
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

So, everything in a set of parentheses at the start if the string, unless that set of parentheses is followed by < or (
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
Do you hate spam? I do, and I am willing to bet you do as well. I often wonder, though, "if people hate spam so much, why do they still post their email addresses on the web?" I'm not talking about a plain-text posting here. I am referring to the fa…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question