Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

extracting data from HTML email

Posted on 2008-06-09
5
Medium Priority
?
267 Views
Last Modified: 2008-07-02
HI

I'm trying to "extract" HTML "blocks" from an HTML e-mail so I can add the data into a DB.

I've already managed to write a script that successfully polls my imap acount for the message according to subject, and it then reads the body into $body.

The data I'm looking for are always "encapsulated" between <p style="width:800px;"> </p> tags, BUT, there are other <p></p> tags in the HTML, and, inside the <p style="width:800px;"> </p> blocks, I want to extract the title, the link of the title and the body as vars so i can write them to DB.

So, a typical email body would look something like this:
<html><head></head><body><div>
<p><b>Welcome</b>
<p style="width:800px;">
<a href="somelink">title</a>
<font color="red">Some stuff</font>here with lots of <b>normal</b> html tags inbetween
</p>
<p style="width:800px;">
<a href="somelink2">title2</a>
<font color="red">Some more stuff</font>here with lots of <b>normal</b> html tags inbetween
</p>
<p style="width:800px;">
<a href="somelink3">title3</a>
<font color="red">Some more stuff</font>here with lots of <b>normal</b> html tags inbetween
</p>
...
...
...
</div></body></html>

Can someone help me with a script that can iteratively go through all the <p></p> blocks so I can get the link, title and body of each block read into a var?




0
Comment
Question by:psimation
  • 3
  • 2
5 Comments
 
LVL 48

Expert Comment

by:hernst42
ID: 21745787
If it's well form HTML you can use http://www.php.net/manual/en/domdocument.loadhtml.php to parse that html and then convert it to a simpleXML-Object
http://www.php.net/manual/en/function.simplexml-import-dom.php
and then use xpath to search those nodes
http://www.php.net/manual/en/function.simplexml-element-xpath.php
0
 
LVL 17

Author Comment

by:psimation
ID: 21746059
Thanks Hernst42

There will most probably going to be some elements I wil have to manaully remove to make it "well formed html" - but in the event one cannot "clean" it up, is there another way (easy) that you know of?

0
 
LVL 48

Expert Comment

by:hernst42
ID: 21746079
the other option is to write a parser, but that parser can't also handle wrong nested tags. Maybe you need to run http://www.php.net/tidy to make the HTML well formed.
So I don't know an easier solution.
0
 
LVL 17

Author Comment

by:psimation
ID: 21746123
Thx, I'll play with that tomorrow - one more thing:

How does this method handle the "attributes" of tags like <a href="link">xxx</a>

From the example, it looks like it can only "extract" content "between" tags, and not the "attributes" - so how would I get the link ( which is theoretically an attribute of the <a> tag )?
0
 
LVL 48

Accepted Solution

by:
hernst42 earned 1500 total points
ID: 21754450
you can convert it back to a domDocument and then output that with all nested tags an attributes.
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are times when I have encountered the need to decompress a response from a PHP request. This is how it's done, but you must have control of the request and you can set the Accept-Encoding header.
It’s a season to be thankful, and we’re thankful for users like you who engage on site, solve technology problems, and network with others in the industry. What tech are we most thankful for? Keep reading.
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to count occurrences of each item in an array.
Suggested Courses

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question