Recognizing nested regex match groups using PHP/Perl

Using PHP and/or Perl (although I would prefer a PHP-only solution) I would like to recognize that a given Perl-formatted regex pattern has nested capture groups within it.

Given this pattern:
/t(e(st)in)g/
I would like a function to evaluate as true since it has nested capture groups.

However this pattern:
/t(?:e(st)in)g/
Would evaluate as false since it does not have nested capture groups (the ?: denotes a non-capturing group).

Note that other constructs that are within ( ) such as flags (?i-m:), lookaheads, lookbehinds, etc. but they should not be considered as true for the purposes of this function, it should only evaluate true for nested capture groups.

Consider that named capture groups in the PHP format of (?P<name>) may exist and would be considered capture groups, evaluating true if nested within another capture group.
LVL 27
ddrudikAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ahoffmannCommented:
if /t(e(st)in)g/ matches the nested pattern, the resulting match has more than one element (back reference $1, $2), if there is only one of those two matches the result is exactly one matching element (back reference $1).

/t(?:e(st)in)g/ either matches or not. As you told the regex not to remember back references, you cannot destinguish in the result if there was a nested or a simple match.

Conclusion: if you need to know if it is a nested match, you have to use the first pattern with the overhead of building back references and then selecting the right one you need.
As a workaround you may check with the first pattern for nested matches and then use the second pattern to get the match you want.

Is this what you're asking for?
0
ahoffmannCommented:
oops, my last description is not correct.

/t(?:e(st)in)g/ only matches if both pattern match, hence it return the inner pattern as back reference if there is a complete match.

So thinking about the difference again, I don't see what's the purpose of your question as both regex surf the same thing (except the difference about the back references).
0
ddrudikAuthor Commented:
The reason does not have to do with the matches etc., just suffice to say that my goal with this question is as I have stated.

The solution will likely involving parsing the regular expression and recognizing differences between parens constructs such as literal parens \( \) parens within character groups [ ] and the non-capture group types (?:) (?is-mx:) (?<=) (?<!) (?=) (?!=) (?!) that I want to ignore and focus on capture groups ( ) only (specifically nested capture groups).

The solution need to do nothing more than evaluate true if nested capture groups exist, although in my testing that's easier said than done.
0
Cloud Class® Course: Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

ahoffmannCommented:
> .. nothing more than evaluate true if nested capture groups exist,
then use the first pattern and check if there're more than one match (as already explained)
0
ddrudikAuthor Commented:
ahoffman, unfortuantely I suspect you don't get what I am trying to do.

I am trying to check if a given regex pattern has nested capture groups defined within the regex pattern itself and not within a source string that the pattern would match against.

If you are referring to how to, with code, recognize that there are nested capture groups defined within a given regex pattern then please explain your solution further since your previous comments don't seem be directed to that.

Multiple backreferences (or capture groups) within a pattern do not meet the criteria for this solution, I am only interest in recognizing "nested" capture groups (i.e. capture groups within other capture groups).
0
ahoffmannCommented:
> .. trying to check if a given regex pattern has nested capture groups defined within the regex
ok, misunderstoof that

to ccheck the regex pattern against nested groups you can use following regex
  m/\([^)]*\(/;
you may try with perl like:
 perl -le '$_="/t(e(st)in)g/";print m/\([^)]*\(/;'
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
ddrudikAuthor Commented:
ahoffmann, you might not fully understand nested capture groups and what other regex constructs may incorrectly appear as nested capture groups.

Consider these following valid (athough not necessarily efficient) Perl-compatible regex patterns:
/this is a test/ismx
/this (is) a test/ismx
/this (is) (a) test/
/this (?:is) (a) test/
/this (?:(?:is) (a)) (?=test)/
/this (?:(?-ismx:is) ((?!b)a)) (?=test)/i
/this ((?:(?m-isx:is) ((?!b)a))) test/i

Only the last pattern actually contains a nested capture group.  Matching a true nested capture group and not matching other similar-appearing regex constructs will involve a more complex pattern and/or pattern(s) than what you have started with.
0
ddrudikAuthor Commented:
I have come to the conclusion that to properly solve this issue would require writing a complete regex parser, something beyond the scope of a question here.  ahoffman, thanks for the help.
0
ahoffmannCommented:
I agree that you need a regex parser for a perfect solution,
I.g. you may detect simple cases with a regex too, for example
  /this is ((?:(?=invisible) ((?!b)a))) test/
but things get complicated if you have someting like:
  /this is a ((?:(?more (invisible)?))) test/
  /this ((?:test) contains (\) braces))/
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.