Link to home
Start Free TrialLog in
Avatar of ddrudik
ddrudikFlag for United States of America

asked on

Recognizing nested regex match groups using PHP/Perl

Using PHP and/or Perl (although I would prefer a PHP-only solution) I would like to recognize that a given Perl-formatted regex pattern has nested capture groups within it.

Given this pattern:
/t(e(st)in)g/
I would like a function to evaluate as true since it has nested capture groups.

However this pattern:
/t(?:e(st)in)g/
Would evaluate as false since it does not have nested capture groups (the ?: denotes a non-capturing group).

Note that other constructs that are within ( ) such as flags (?i-m:), lookaheads, lookbehinds, etc. but they should not be considered as true for the purposes of this function, it should only evaluate true for nested capture groups.

Consider that named capture groups in the PHP format of (?P<name>) may exist and would be considered capture groups, evaluating true if nested within another capture group.
SOLUTION
Avatar of ahoffmann
ahoffmann
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of ddrudik

ASKER

The reason does not have to do with the matches etc., just suffice to say that my goal with this question is as I have stated.

The solution will likely involving parsing the regular expression and recognizing differences between parens constructs such as literal parens \( \) parens within character groups [ ] and the non-capture group types (?:) (?is-mx:) (?<=) (?<!) (?=) (?!=) (?!) that I want to ignore and focus on capture groups ( ) only (specifically nested capture groups).

The solution need to do nothing more than evaluate true if nested capture groups exist, although in my testing that's easier said than done.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of ddrudik

ASKER

ahoffman, unfortuantely I suspect you don't get what I am trying to do.

I am trying to check if a given regex pattern has nested capture groups defined within the regex pattern itself and not within a source string that the pattern would match against.

If you are referring to how to, with code, recognize that there are nested capture groups defined within a given regex pattern then please explain your solution further since your previous comments don't seem be directed to that.

Multiple backreferences (or capture groups) within a pattern do not meet the criteria for this solution, I am only interest in recognizing "nested" capture groups (i.e. capture groups within other capture groups).
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of ddrudik

ASKER

ahoffmann, you might not fully understand nested capture groups and what other regex constructs may incorrectly appear as nested capture groups.

Consider these following valid (athough not necessarily efficient) Perl-compatible regex patterns:
/this is a test/ismx
/this (is) a test/ismx
/this (is) (a) test/
/this (?:is) (a) test/
/this (?:(?:is) (a)) (?=test)/
/this (?:(?-ismx:is) ((?!b)a)) (?=test)/i
/this ((?:(?m-isx:is) ((?!b)a))) test/i

Only the last pattern actually contains a nested capture group.  Matching a true nested capture group and not matching other similar-appearing regex constructs will involve a more complex pattern and/or pattern(s) than what you have started with.
Avatar of ddrudik

ASKER

I have come to the conclusion that to properly solve this issue would require writing a complete regex parser, something beyond the scope of a question here.  ahoffman, thanks for the help.
I agree that you need a regex parser for a perfect solution,
I.g. you may detect simple cases with a regex too, for example
  /this is ((?:(?=invisible) ((?!b)a))) test/
but things get complicated if you have someting like:
  /this is a ((?:(?more (invisible)?))) test/
  /this ((?:test) contains (\) braces))/