Regex for nested tables

I'm looking for a regex to match an entire table, which can contain tables in itself.

See my code snippet for an example.

The idea is to only match the parent table, so it has to keep track of how 'deep' it is in the html code



//some html syntax
<table> //match this table start tag
   <tr><td><table><tr><td>more html</td></tr></table> </td></tr>//dont match these table tags
   <tr>etc..</tr>
</table>  //match this closing table tag

Open in new window

LVL 3
DerkArtsAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

David S.Commented:
RegEx can't keep track of nesting like that.  It sounds like you may need an HTML parser.
0
webvogelCommented:
Is this the result you want?
<table> //match this table start tag
   <tr><td> </td></tr>//dont match these table tags
   <tr>etc..</tr>
</table>

This is the result of the my regex:
Array
(
    [0] => <table> //match this table start tag
   <tr><td><table><tr><td>more html</td></tr></table> </td></tr>//dont match these table tags
   <tr>etc..</tr>
</table>
    [1] => <table> //match this table start tag
   <tr><td>
    [2] => <table><tr><td>more html</td></tr></table>

    [3] =>  </td></tr>//dont match these table tags
   <tr>etc..</tr>
</table>
)

But you have to know how much tables are nested, else the result is wrong, because the function matches the first open tag with the first close tag. Don't know if there is any other solution. The + or ? is not working :-(
preg_match('#(<table>.*)(<table>.*</table>)+(.*</table>)#is', $str, $table);
 
// print out all:
print_r($table);
 
// print only the result you want (?)
print_r($table[1].$table[3]);

Open in new window

0
ddrudikCommented:

preg_match_all('~<table[^>]*>(?:(?>(?:(?!<(?:/table|table[^>]*)>).)+)|(?R))*</table>~is',$sourcestring,$matches);

Open in new window

0
Bootstrap 4: Exploring New Features

Learn how to use and navigate the new features included in Bootstrap 4, the most popular HTML, CSS, and JavaScript framework for developing responsive, mobile-first websites.

DerkArtsAuthor Commented:
ddrudik:
This matches the innermost table, when I test it. But there is an error in the syntax:
<table[^>]*>(?:(?>(?:(?!<(?:/table|table[^>]*)>).)+)|(?R))*</table>

The ?R is not right, what did you intend to write there? ?> and ?: both result in the innermost table.

Just to clarify, I only want to match the outermost table
0
ddrudikCommented:
(?R) is a proper PHP preg regex construct, not sure what else is validating your source but there is no error in my source.
The outer-most table in my sample is selected, please provide the table sample you are using for testing.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
DerkArtsAuthor Commented:
Thank you, indeed, regexbuddy didnt compile it but. But php does. Would you care to explain this regex to me? Especially the ?R part. Thank you
0
ddrudikCommented:
Thanks for the question and the points.

(?R) is a recursive construct, see:
http://us3.php.net/manual/en/regexp.reference.recursive.php
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.