Solved

Extract Innermost Nested Table from HTML Source Code

Posted on 2007-03-18
2
472 Views
Last Modified: 2013-12-13
I have a large collection of html pages, all having html tables that I need to extract.  Some of the pages have a single table. Others have nested tables, up to four layers.
  1) If the file has only 1 table, I need to extract that table into a variable (I know I could do something like http:Q_21758443.html or by using preg_match); but
  2) If the file has nested tables, I need only the innermost table.
    In other words, no matter whether the html file has only 1 table, or nested tables, it should always extract the innermost table.  How is this done?
0
Comment
Question by:Randall-B
2 Comments
 
LVL 9

Accepted Solution

by:
richdiesal earned 500 total points
ID: 18744403
This seems a little simplistic, but it may get you where you want to go.  Also keep in mind that it won't work appropriately if you have multiple nested tables on any page.  That's an altogether different beast.  The HTML inside $content is a 3-deep nested table, with final value of $lasttable: '<tr><td>Hello PHP</td></tr>':

$content = "<html>
<head>
</head>
<body>
<table><tr><td>Hello World</td></tr>
<tr><td><table><tr><td>Hello c#</td></tr>
<tr><td><table><tr><td>Hello PHP</td></tr></table>
</td></tr></table>
</td></tr></table>
</body>
</html>";

$array = explode("<table>",$content);
$lasttable_raw = end($array);
$junk_data_starts = strpos($lasttable_raw,"</table>");
$lasttable = substr($lasttable_raw, 0, $junk_data_starts);

echo $lasttable;
0
 

Author Comment

by:Randall-B
ID: 18745160
richdiesal,
   Great.  At first I couldn't figure out why it wasn't working with some of my real files. Then I discovered that some do not have plain <table> tags; some have more complicated tags like <table width="100%">, etc.  So I started out with:

    $fileContent = preg_replace('/<table(.*?)>/', '<table>', $fileContent);

It seems to work great.  I would call your code elegant.  Thanks.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this.Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it is …
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now