Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Extract table from html file string - Regular Expression?

Posted on 2009-07-16
8
1,404 Views
Last Modified: 2012-05-07
I'm opening and reading an HTML file into a string using PHP.  I don't excel at regular expressions so I'm looking for some help.  I need to be able to extract one table from this HTML file that I know will always come immediately after the string "<p>Click on Course ID link in the first column to drop/change a class.</p>", two newline characters and maybe a little white-space ( a single tab or whatever that translates into ).  I'm pretty sure this table will always start at the same line, but it will not always end at the same line.  Is this easily done using regular expressions?
Here's what I'm trying to do.  The table has some information I'm trying to extract, but I'm going to do it using JavaScript.  So, I'm trying to extract the info from an uploaded file into a php script that will print it so that when the page is loaded I can use JavaScript's DOM to gather the info.  I'm open to suggestions as an alternate method (i.e. a PHP table parser), but I'm on a shared server and I don't want to deal with any added extensions.
0
Comment
Question by:khsater
  • 4
  • 2
  • 2
8 Comments
 
LVL 40

Accepted Solution

by:
mrjoltcola earned 400 total points
ID: 24872034
Yes, probably something like this.

/<p>Click on Course ID link in the first column to drop/change a class.<\/p>.*?(<table>.*<\/table>)/


Table will be captured in $1 of a regex engine
0
 
LVL 4

Author Comment

by:khsater
ID: 24872965
You forgot to escape the one of the slashes, which took me a while to notice, but besides that I couldn't get the regular expression to match when I used fread to read the file.  It works perfectly when I copy and paste the code, though.  Any idea why?
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24873007
Not sure. the surrounding / / are just normal notation for regular expressions. Those are probably left off in PHP if I recall.
0
Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

 
LVL 4

Author Comment

by:khsater
ID: 24873057
That wasn't it.  The surrounding slashes are included in PHP pregs.
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 100 total points
ID: 24874016
Expanding on mrjoltcola's code, you'll want to allow the . regex wildcard to match newline characters by the sound of it. With preg_match, you'll need the "s" pattern modifier to do that:
$input = "blah
blah
asdfsdf  <p>Click on Course ID link in the first column to drop/change a class.</p>
  <table> table contents
more contents
more contents </table>
blah blah blah";
 
#with s pattern modifier to allow . to match newlines:
$pattern = "/<p>Click on Course ID link in the first column to drop\/change a class.<\/p>.*?(<table>.*?<\/table>)/s";  
preg_match($pattern, $input, $matches);
print "Result:{$matches[1]}";
 
Output:
Result:<table> table contents
more contents
more contents </table>

Open in new window

0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 24874030
Pattern modifiers etc are documented in the PCRE cheat sheet, in case you haven't seen it, available from here:
http://www.phpguru.org/article/pcre-cheat-sheet
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24874321
Thanks TerryAtOpus, I can't believe I blanked out on that one detail. I was a bit too distracted, I apologize.
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24874825
Terry did the heavy lifting on that one, I feel like I took too many points. :(
Glad to help.

0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Checkout Page Input Field not aligned 1 26
Moving from Mcrypt to OpenSSL 18 45
Could you point a way to obtain a specifically  DateInterval Object value? 2 21
Link failure 16 31
Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

792 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question