Link to home
Start Free TrialLog in
Avatar of Randall-B
Randall-B

asked on

Extract Innermost Nested Table from HTML Source Code

I have a large collection of html pages, all having html tables that I need to extract.  Some of the pages have a single table. Others have nested tables, up to four layers.
  1) If the file has only 1 table, I need to extract that table into a variable (I know I could do something like http:Q_21758443.html or by using preg_match); but
  2) If the file has nested tables, I need only the innermost table.
    In other words, no matter whether the html file has only 1 table, or nested tables, it should always extract the innermost table.  How is this done?
ASKER CERTIFIED SOLUTION
Avatar of richdiesal
richdiesal
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Randall-B
Randall-B

ASKER

richdiesal,
   Great.  At first I couldn't figure out why it wasn't working with some of my real files. Then I discovered that some do not have plain <table> tags; some have more complicated tags like <table width="100%">, etc.  So I started out with:

    $fileContent = preg_replace('/<table(.*?)>/', '<table>', $fileContent);

It seems to work great.  I would call your code elegant.  Thanks.