Solved

Extract table from html file string - Regular Expression?

Posted on 2009-07-16
8
1,332 Views
Last Modified: 2012-05-07
I'm opening and reading an HTML file into a string using PHP.  I don't excel at regular expressions so I'm looking for some help.  I need to be able to extract one table from this HTML file that I know will always come immediately after the string "<p>Click on Course ID link in the first column to drop/change a class.</p>", two newline characters and maybe a little white-space ( a single tab or whatever that translates into ).  I'm pretty sure this table will always start at the same line, but it will not always end at the same line.  Is this easily done using regular expressions?
Here's what I'm trying to do.  The table has some information I'm trying to extract, but I'm going to do it using JavaScript.  So, I'm trying to extract the info from an uploaded file into a php script that will print it so that when the page is loaded I can use JavaScript's DOM to gather the info.  I'm open to suggestions as an alternate method (i.e. a PHP table parser), but I'm on a shared server and I don't want to deal with any added extensions.
0
Comment
Question by:khsater
  • 4
  • 2
  • 2
8 Comments
 
LVL 40

Accepted Solution

by:
mrjoltcola earned 400 total points
ID: 24872034
Yes, probably something like this.

/<p>Click on Course ID link in the first column to drop/change a class.<\/p>.*?(<table>.*<\/table>)/


Table will be captured in $1 of a regex engine
0
 
LVL 4

Author Comment

by:khsater
ID: 24872965
You forgot to escape the one of the slashes, which took me a while to notice, but besides that I couldn't get the regular expression to match when I used fread to read the file.  It works perfectly when I copy and paste the code, though.  Any idea why?
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24873007
Not sure. the surrounding / / are just normal notation for regular expressions. Those are probably left off in PHP if I recall.
0
 
LVL 4

Author Comment

by:khsater
ID: 24873057
That wasn't it.  The surrounding slashes are included in PHP pregs.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 100 total points
ID: 24874016
Expanding on mrjoltcola's code, you'll want to allow the . regex wildcard to match newline characters by the sound of it. With preg_match, you'll need the "s" pattern modifier to do that:
$input = "blah

blah

asdfsdf  <p>Click on Course ID link in the first column to drop/change a class.</p>

  <table> table contents

more contents

more contents </table>

blah blah blah";
 

#with s pattern modifier to allow . to match newlines:

$pattern = "/<p>Click on Course ID link in the first column to drop\/change a class.<\/p>.*?(<table>.*?<\/table>)/s";  

preg_match($pattern, $input, $matches);

print "Result:{$matches[1]}";
 

Output:

Result:<table> table contents

more contents

more contents </table>

Open in new window

0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 24874030
Pattern modifiers etc are documented in the PCRE cheat sheet, in case you haven't seen it, available from here:
http://www.phpguru.org/article/pcre-cheat-sheet
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24874321
Thanks TerryAtOpus, I can't believe I blanked out on that one detail. I was a bit too distracted, I apologize.
0
 
LVL 40

Expert Comment

by:mrjoltcola
ID: 24874825
Terry did the heavy lifting on that one, I feel like I took too many points. :(
Glad to help.

0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
This article discusses four methods for overlaying images in a container on a web page
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to dynamically set the form action using jQuery.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now