Link to home
Start Free TrialLog in
Avatar of DerkArts
DerkArtsFlag for Netherlands

asked on

Regex for nested tables

I'm looking for a regex to match an entire table, which can contain tables in itself.

See my code snippet for an example.

The idea is to only match the parent table, so it has to keep track of how 'deep' it is in the html code



//some html syntax
<table> //match this table start tag
   <tr><td><table><tr><td>more html</td></tr></table> </td></tr>//dont match these table tags
   <tr>etc..</tr>
</table>  //match this closing table tag

Open in new window

Avatar of David S.
David S.
Flag of United States of America image

RegEx can't keep track of nesting like that.  It sounds like you may need an HTML parser.
Avatar of webvogel
webvogel

Is this the result you want?
<table> //match this table start tag
   <tr><td> </td></tr>//dont match these table tags
   <tr>etc..</tr>
</table>

This is the result of the my regex:
Array
(
    [0] => <table> //match this table start tag
   <tr><td><table><tr><td>more html</td></tr></table> </td></tr>//dont match these table tags
   <tr>etc..</tr>
</table>
    [1] => <table> //match this table start tag
   <tr><td>
    [2] => <table><tr><td>more html</td></tr></table>

    [3] =>  </td></tr>//dont match these table tags
   <tr>etc..</tr>
</table>
)

But you have to know how much tables are nested, else the result is wrong, because the function matches the first open tag with the first close tag. Don't know if there is any other solution. The + or ? is not working :-(
preg_match('#(<table>.*)(<table>.*</table>)+(.*</table>)#is', $str, $table);
 
// print out all:
print_r($table);
 
// print only the result you want (?)
print_r($table[1].$table[3]);

Open in new window


preg_match_all('~<table[^>]*>(?:(?>(?:(?!<(?:/table|table[^>]*)>).)+)|(?R))*</table>~is',$sourcestring,$matches);

Open in new window

Avatar of DerkArts

ASKER

ddrudik:
This matches the innermost table, when I test it. But there is an error in the syntax:
<table[^>]*>(?:(?>(?:(?!<(?:/table|table[^>]*)>).)+)|(?R))*</table>

The ?R is not right, what did you intend to write there? ?> and ?: both result in the innermost table.

Just to clarify, I only want to match the outermost table
ASKER CERTIFIED SOLUTION
Avatar of ddrudik
ddrudik
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you, indeed, regexbuddy didnt compile it but. But php does. Would you care to explain this regex to me? Especially the ?R part. Thank you
Thanks for the question and the points.

(?R) is a recursive construct, see:
http://us3.php.net/manual/en/regexp.reference.recursive.php