Solved

Extract table from html file string - Regular Expression?

Posted on 2009-07-16
8
1,401 Views
Last Modified: 2012-05-07
I'm opening and reading an HTML file into a string using PHP.  I don't excel at regular expressions so I'm looking for some help.  I need to be able to extract one table from this HTML file that I know will always come immediately after the string "<p>Click on Course ID link in the first column to drop/change a class.</p>", two newline characters and maybe a little white-space ( a single tab or whatever that translates into ).  I'm pretty sure this table will always start at the same line, but it will not always end at the same line.  Is this easily done using regular expressions?
Here's what I'm trying to do.  The table has some information I'm trying to extract, but I'm going to do it using JavaScript.  So, I'm trying to extract the info from an uploaded file into a php script that will print it so that when the page is loaded I can use JavaScript's DOM to gather the info.  I'm open to suggestions as an alternate method (i.e. a PHP table parser), but I'm on a shared server and I don't want to deal with any added extensions.
0
Comment
Question by:khsater
  • 4
  • 2
  • 2
8 Comments
 
LVL 40

Accepted Solution

by:
mrjoltcola earned 400 total points
ID: 24872034
Yes, probably something like this.

/<p>Click on Course ID link in the first column to drop/change a class.<\/p>.*?(<table>.*<\/table>)/


Table will be captured in $1 of a regex engine
0
 
LVL 4

Author Comment

by:khsater
ID: 24872965
You forgot to escape the one of the slashes, which took me a while to notice, but besides that I couldn't get the regular expression to match when I used fread to read the file.  It works perfectly when I copy and paste the code, though.  Any idea why?
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24873007
Not sure. the surrounding / / are just normal notation for regular expressions. Those are probably left off in PHP if I recall.
0
Are your AD admin tools letting you down?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

 
LVL 4

Author Comment

by:khsater
ID: 24873057
That wasn't it.  The surrounding slashes are included in PHP pregs.
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 100 total points
ID: 24874016
Expanding on mrjoltcola's code, you'll want to allow the . regex wildcard to match newline characters by the sound of it. With preg_match, you'll need the "s" pattern modifier to do that:
$input = "blah
blah
asdfsdf  <p>Click on Course ID link in the first column to drop/change a class.</p>
  <table> table contents
more contents
more contents </table>
blah blah blah";
 
#with s pattern modifier to allow . to match newlines:
$pattern = "/<p>Click on Course ID link in the first column to drop\/change a class.<\/p>.*?(<table>.*?<\/table>)/s";  
preg_match($pattern, $input, $matches);
print "Result:{$matches[1]}";
 
Output:
Result:<table> table contents
more contents
more contents </table>

Open in new window

0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 24874030
Pattern modifiers etc are documented in the PCRE cheat sheet, in case you haven't seen it, available from here:
http://www.phpguru.org/article/pcre-cheat-sheet
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24874321
Thanks TerryAtOpus, I can't believe I blanked out on that one detail. I was a bit too distracted, I apologize.
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24874825
Terry did the heavy lifting on that one, I feel like I took too many points. :(
Glad to help.

0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question