Solved

Extract Innermost Nested Table from HTML Source Code

Posted on 2007-03-18
2
487 Views
Last Modified: 2013-12-13
I have a large collection of html pages, all having html tables that I need to extract.  Some of the pages have a single table. Others have nested tables, up to four layers.
  1) If the file has only 1 table, I need to extract that table into a variable (I know I could do something like http:Q_21758443.html or by using preg_match); but
  2) If the file has nested tables, I need only the innermost table.
    In other words, no matter whether the html file has only 1 table, or nested tables, it should always extract the innermost table.  How is this done?
0
Comment
Question by:Randall-B
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 9

Accepted Solution

by:
richdiesal earned 500 total points
ID: 18744403
This seems a little simplistic, but it may get you where you want to go.  Also keep in mind that it won't work appropriately if you have multiple nested tables on any page.  That's an altogether different beast.  The HTML inside $content is a 3-deep nested table, with final value of $lasttable: '<tr><td>Hello PHP</td></tr>':

$content = "<html>
<head>
</head>
<body>
<table><tr><td>Hello World</td></tr>
<tr><td><table><tr><td>Hello c#</td></tr>
<tr><td><table><tr><td>Hello PHP</td></tr></table>
</td></tr></table>
</td></tr></table>
</body>
</html>";

$array = explode("<table>",$content);
$lasttable_raw = end($array);
$junk_data_starts = strpos($lasttable_raw,"</table>");
$lasttable = substr($lasttable_raw, 0, $junk_data_starts);

echo $lasttable;
0
 

Author Comment

by:Randall-B
ID: 18745160
richdiesal,
   Great.  At first I couldn't figure out why it wasn't working with some of my real files. Then I discovered that some do not have plain <table> tags; some have more complicated tags like <table width="100%">, etc.  So I started out with:

    $fileContent = preg_replace('/<table(.*?)>/', '<table>', $fileContent);

It seems to work great.  I would call your code elegant.  Thanks.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
This article discusses how to create an extensible mechanism for linked drop downs.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to count occurrences of each item in an array.

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question