Need help with regular expression to match and capture single term expressions inside ()s.

I am using the pattern:
\((([-]?[0-9]*\.?[0-9]*)?([a-z]?(<sup>[-]?[0-9]*\.?[0-9]+</sup>)?)*)*\)(?!(<sup>[-]?[0-9]*\.?[0-9]*))
on the below text to match only single terms inside a pair of () and capture it in group 1  unless a superscript number exist on the outside of the closing ) or I have multiple ()s next to each other (ex. (s)(2x)).
(3d)/5
(3d)*d
a+(3d)+2
(3d)-o
(3.5d)
(4.4)
7x-(3d)/5
(3d)*d
(12)
(a<sup>2</sup>b<sup>-2</sup>)
(xy)
(-33)
9+2
(3w<sup>-55</sup>)
6*(3555d)+2
a/(13d)-o
(s)<sup>3</sup>
7x-(3d)
(3d)*d
(2ab+3an)
6*(3d)+2
a/(3d)-o
(2w)=
=(2w)
(s)
(s)(2x)

Everything seems to work the way I want except two items:
The terms that are numbers only (4.4, 12 & -33) match but nothing exists in group 1-3 and group 4 & 5 don't match.
The last expression (s)(2x) matches both terms inside the ()s and I don't want it to match if I have multiple ()s next to each other.

What am I missing?  Is there a way to make this less complicated?
Who is Participating?

Commented:
Assuming  2a<sup> was supposed to be 2.555a<sup>, this works:
print \$1 if /^\(([^)]+)\)(?![<(])/;
Otherwise, I'll need more examples to determine exactly what is to be captured.
0

Commented:
Are you saying that in the case of
(4.4)
(12)
(-33)
you want something to exist in  group 1-3?
do you want group 4 & 5 to match?
I don't get anything in 1-3 for any of them, and \$4 matches only <sup>-2</sup> and <sup>-55</sup>
0

System EngineerCommented:
It would help a lot if you told us which language's regex are we talking about. Ordinary egrep doesn't support "?!"
0

Commented:
The * at the end of group 2 means match 0 or more times, as many times as possible.
the ? at the end of group 3 means match optionally,
the * at the end of group 4 means match 0 or more times
this means that the empty string matches groups 3 and 4,
So when  group 2 matches as many times as possible, the last possible time will be matching the empty string.
Only this last match will be stored.

Did you mean to say ? instead of * for group 2?
Or did you need the ? on group 3 given the * on group 2?
0

Author Commented:
Thanks for the responses.

@Surrano:  I am using VB.net (VS2012).

@ozo:
I am saying that in the cases of 4.4, 12, & -33 I want group 1 to match 4.4, 12, & -33.  I really don't care what any other group matches.  The reason I mentioned the other groups was that I was only trying to provide all the info that I had on my situation.  I'm sorry for the confusion.

As for the breakdown, of the pattern. I am attempting to breakdown parts of a term.

([-]?[0-9]*\.?[0-9]*)?     is for coefficients/constants that may be +/-, have decimal points or not be present at.

([a-z]?(<sup>[-]?[0-9]*\.?[0-9]+</sup>)?)*     is for variables that may or may not have exponents or not be present at all.  I believe that I need the ? on group 3 because an exponent could only exists a maximum of 1 times if a variable existed at all.

I am most-likely over complicating this.

The only reason I added the code to match the constants/coefficients, variables and exponents was that I was trying to differentiate between single and multiple term expressions within the ()s.

I know that the operations inside of the ()s will only be addition, so for group 1 couldn't I just grab everything inside the ()s as long as a + wasn't present?  Then I would only need to ensure that an exponent wasn't out side the closing ).  I was thinking something like \(([^+]+?)\) it seems to work by itself for identifying ()s with only single terms but I can't get it to work with the negative look ahead for exponents.
0

Commented:
Please give some examples telling whether or not you want to match, and if it matches, what you would want to capture.
0

Author Commented:
I hope this helps.

string                                                                                       capture
(2)                                                                                              2
(2.555)                                                                                       2.555
(2a)                                                                                            2a
(2.555a)                                                                                     2.555a
(2.555a<sup>2</sup>)                                                            2a<sup>2</sup>
(2.555a<sup>2.555</sup>)                                                    2a<sup>2.555</sup>
(a)                                                                                             a
(a<sup>2</sup>)                                                                     a<sup>2</sup>
(ab)                                                                                           ab
(a<sup>2</sup>b)                                                                  a<sup>2</sup>b
(a<sup>2</sup>b<sup>2</sup>)                                               a<sup>2</sup>b<sup>2</sup>
(2)<sup>2</sup>                                                                    nothing
(2.555)<sup>2</sup>                                                             nothing
(2a)<sup>2</sup>                                                                  nothing
(2.555a)<sup>2</sup>                                                           nothing
(2.555a<sup>2</sup>)<sup>2</sup>                                  nothing
(2.555a<sup>2.555</sup>)<sup>2</sup>                           nothing
(a)<sup>2</sup>                                                                    nothing
(a<sup>2</sup>)<sup>2</sup>                                           nothing
(ab)<sup>2</sup>                                                                 nothing
(a<sup>2</sup>b)<sup>2</sup>                                         nothing
(a<sup>2</sup>b<sup>2</sup>)<sup>2</sup>                 nothing
(any expression)(any expression)                                         nothing
0

Commented:
I am having no luck getting every single row to match using your test data with only one regex; I think you'll have to simply use multiple regexes, and either check each one on each line in a loop, or use an array of regexes if you're in PHP.
0

Commented:
(2.555a<sup>2</sup>)                                                            2a<sup>2</sup>
How does 2a<sup>2</sup> come from (2.555a<sup>2</sup>)   ?
Are we to ignore \.\d+ in the case when it is followed by a<sup>?
What if it is followed by <sup> with no a?
Do we only take the first and last character of whatever precedes <sup>?
0

Author Commented:
sorry,
(2.555a<sup>2</sup>)                                                            2a<sup>2</sup>
should have been
(2.555a<sup>2</sup>)                                                            2.555a<sup>2</sup>

Are we to ignore \.\d+ in the case when it is followed by a<sup>?

I'm guessing \.\d is from your code, so if your asking if your to ignore a decimal point and the following numbers if an exponent follows it. (ex. (2.555a<sup>2</sup>) ) the answer is no.

What if it is followed by <sup> with no a? no.

Do we only take the first and last character of whatever precedes <sup>?  no, if <sup> is within the ()s take everything.  If <sup> is outside the ()s take nothing.
0

Commented:
If (2.555a<sup>2</sup>) should have been 2.555a<sup>2</sup>
then /^\(([^)]+)\)(?![<(])/ seems to do everything you want on the examples in http:#a39789108
0

Author Commented:
It answer all the examples that I gave but could you please break it down and explain it to me because all I understand is the negative look ahead part.
0

Commented:
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/^\(([^)]+)\)(?![<(])/)->explain'
The regular expression:

(?-imsx:^\(([^)]+)\)(?![<(]))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
(with ^ and \$ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
^                        the beginning of the string
----------------------------------------------------------------------
\(                       '('
----------------------------------------------------------------------
(                        group and capture to \1:
----------------------------------------------------------------------
[^)]+                    any character except: ')' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
)                        end of \1
----------------------------------------------------------------------
\)                       ')'
----------------------------------------------------------------------
(?!                      look ahead to see if there is not:
----------------------------------------------------------------------
[<(]                     any character of: '<', '('
----------------------------------------------------------------------
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

So, everything in a set of parentheses at the start if the string, unless that set of parentheses is followed by < or (
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.