Solved

Extract Innermost Nested Table from HTML Source Code

Posted on 2007-03-18
2
478 Views
Last Modified: 2013-12-13
I have a large collection of html pages, all having html tables that I need to extract.  Some of the pages have a single table. Others have nested tables, up to four layers.
  1) If the file has only 1 table, I need to extract that table into a variable (I know I could do something like http:Q_21758443.html or by using preg_match); but
  2) If the file has nested tables, I need only the innermost table.
    In other words, no matter whether the html file has only 1 table, or nested tables, it should always extract the innermost table.  How is this done?
0
Comment
Question by:Randall-B
2 Comments
 
LVL 9

Accepted Solution

by:
richdiesal earned 500 total points
ID: 18744403
This seems a little simplistic, but it may get you where you want to go.  Also keep in mind that it won't work appropriately if you have multiple nested tables on any page.  That's an altogether different beast.  The HTML inside $content is a 3-deep nested table, with final value of $lasttable: '<tr><td>Hello PHP</td></tr>':

$content = "<html>
<head>
</head>
<body>
<table><tr><td>Hello World</td></tr>
<tr><td><table><tr><td>Hello c#</td></tr>
<tr><td><table><tr><td>Hello PHP</td></tr></table>
</td></tr></table>
</td></tr></table>
</body>
</html>";

$array = explode("<table>",$content);
$lasttable_raw = end($array);
$junk_data_starts = strpos($lasttable_raw,"</table>");
$lasttable = substr($lasttable_raw, 0, $junk_data_starts);

echo $lasttable;
0
 

Author Comment

by:Randall-B
ID: 18745160
richdiesal,
   Great.  At first I couldn't figure out why it wasn't working with some of my real files. Then I discovered that some do not have plain <table> tags; some have more complicated tags like <table width="100%">, etc.  So I started out with:

    $fileContent = preg_replace('/<table(.*?)>/', '<table>', $fileContent);

It seems to work great.  I would call your code elegant.  Thanks.
0

Featured Post

Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
This article discusses how to create an extensible mechanism for linked drop downs.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question