Recognizing nested regex match groups using PHP/Perl

Using PHP and/or Perl (although I would prefer a PHP-only solution) I would like to recognize that a given Perl-formatted regex pattern has nested capture groups within it.

Given this pattern:
/t(e(st)in)g/
I would like a function to evaluate as true since it has nested capture groups.

However this pattern:
/t(?:e(st)in)g/
Would evaluate as false since it does not have nested capture groups (the ?: denotes a non-capturing group).

Note that other constructs that are within ( ) such as flags (?i-m:), lookaheads, lookbehinds, etc. but they should not be considered as true for the purposes of this function, it should only evaluate true for nested capture groups.

Consider that named capture groups in the PHP format of (?P<name>) may exist and would be considered capture groups, evaluating true if nested within another capture group.
LVL 27
ddrudikAsked:
Who is Participating?
 
ahoffmannConnect With a Mentor Commented:
> .. trying to check if a given regex pattern has nested capture groups defined within the regex
ok, misunderstoof that

to ccheck the regex pattern against nested groups you can use following regex
  m/\([^)]*\(/;
you may try with perl like:
 perl -le '$_="/t(e(st)in)g/";print m/\([^)]*\(/;'
0
 
ahoffmannConnect With a Mentor Commented:
if /t(e(st)in)g/ matches the nested pattern, the resulting match has more than one element (back reference $1, $2), if there is only one of those two matches the result is exactly one matching element (back reference $1).

/t(?:e(st)in)g/ either matches or not. As you told the regex not to remember back references, you cannot destinguish in the result if there was a nested or a simple match.

Conclusion: if you need to know if it is a nested match, you have to use the first pattern with the overhead of building back references and then selecting the right one you need.
As a workaround you may check with the first pattern for nested matches and then use the second pattern to get the match you want.

Is this what you're asking for?
0
 
ahoffmannConnect With a Mentor Commented:
oops, my last description is not correct.

/t(?:e(st)in)g/ only matches if both pattern match, hence it return the inner pattern as back reference if there is a complete match.

So thinking about the difference again, I don't see what's the purpose of your question as both regex surf the same thing (except the difference about the back references).
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
ddrudikAuthor Commented:
The reason does not have to do with the matches etc., just suffice to say that my goal with this question is as I have stated.

The solution will likely involving parsing the regular expression and recognizing differences between parens constructs such as literal parens \( \) parens within character groups [ ] and the non-capture group types (?:) (?is-mx:) (?<=) (?<!) (?=) (?!=) (?!) that I want to ignore and focus on capture groups ( ) only (specifically nested capture groups).

The solution need to do nothing more than evaluate true if nested capture groups exist, although in my testing that's easier said than done.
0
 
ahoffmannConnect With a Mentor Commented:
> .. nothing more than evaluate true if nested capture groups exist,
then use the first pattern and check if there're more than one match (as already explained)
0
 
ddrudikAuthor Commented:
ahoffman, unfortuantely I suspect you don't get what I am trying to do.

I am trying to check if a given regex pattern has nested capture groups defined within the regex pattern itself and not within a source string that the pattern would match against.

If you are referring to how to, with code, recognize that there are nested capture groups defined within a given regex pattern then please explain your solution further since your previous comments don't seem be directed to that.

Multiple backreferences (or capture groups) within a pattern do not meet the criteria for this solution, I am only interest in recognizing "nested" capture groups (i.e. capture groups within other capture groups).
0
 
ddrudikAuthor Commented:
ahoffmann, you might not fully understand nested capture groups and what other regex constructs may incorrectly appear as nested capture groups.

Consider these following valid (athough not necessarily efficient) Perl-compatible regex patterns:
/this is a test/ismx
/this (is) a test/ismx
/this (is) (a) test/
/this (?:is) (a) test/
/this (?:(?:is) (a)) (?=test)/
/this (?:(?-ismx:is) ((?!b)a)) (?=test)/i
/this ((?:(?m-isx:is) ((?!b)a))) test/i

Only the last pattern actually contains a nested capture group.  Matching a true nested capture group and not matching other similar-appearing regex constructs will involve a more complex pattern and/or pattern(s) than what you have started with.
0
 
ddrudikAuthor Commented:
I have come to the conclusion that to properly solve this issue would require writing a complete regex parser, something beyond the scope of a question here.  ahoffman, thanks for the help.
0
 
ahoffmannCommented:
I agree that you need a regex parser for a perfect solution,
I.g. you may detect simple cases with a regex too, for example
  /this is ((?:(?=invisible) ((?!b)a))) test/
but things get complicated if you have someting like:
  /this is a ((?:(?more (invisible)?))) test/
  /this ((?:test) contains (\) braces))/
0
All Courses

From novice to tech pro — start learning today.