?
Solved

Extract Innermost Nested Table from HTML Source Code

Posted on 2007-03-18
2
Medium Priority
?
495 Views
Last Modified: 2013-12-13
I have a large collection of html pages, all having html tables that I need to extract.  Some of the pages have a single table. Others have nested tables, up to four layers.
  1) If the file has only 1 table, I need to extract that table into a variable (I know I could do something like http:Q_21758443.html or by using preg_match); but
  2) If the file has nested tables, I need only the innermost table.
    In other words, no matter whether the html file has only 1 table, or nested tables, it should always extract the innermost table.  How is this done?
0
Comment
Question by:Randall-B
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 9

Accepted Solution

by:
richdiesal earned 2000 total points
ID: 18744403
This seems a little simplistic, but it may get you where you want to go.  Also keep in mind that it won't work appropriately if you have multiple nested tables on any page.  That's an altogether different beast.  The HTML inside $content is a 3-deep nested table, with final value of $lasttable: '<tr><td>Hello PHP</td></tr>':

$content = "<html>
<head>
</head>
<body>
<table><tr><td>Hello World</td></tr>
<tr><td><table><tr><td>Hello c#</td></tr>
<tr><td><table><tr><td>Hello PHP</td></tr></table>
</td></tr></table>
</td></tr></table>
</body>
</html>";

$array = explode("<table>",$content);
$lasttable_raw = end($array);
$junk_data_starts = strpos($lasttable_raw,"</table>");
$lasttable = substr($lasttable_raw, 0, $junk_data_starts);

echo $lasttable;
0
 

Author Comment

by:Randall-B
ID: 18745160
richdiesal,
   Great.  At first I couldn't figure out why it wasn't working with some of my real files. Then I discovered that some do not have plain <table> tags; some have more complicated tags like <table width="100%">, etc.  So I started out with:

    $fileContent = preg_replace('/<table(.*?)>/', '<table>', $fileContent);

It seems to work great.  I would call your code elegant.  Thanks.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
Suggested Courses

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question