Solved

extracting data from HTML email

Posted on 2008-06-09
5
259 Views
Last Modified: 2008-07-02
HI

I'm trying to "extract" HTML "blocks" from an HTML e-mail so I can add the data into a DB.

I've already managed to write a script that successfully polls my imap acount for the message according to subject, and it then reads the body into $body.

The data I'm looking for are always "encapsulated" between <p style="width:800px;"> </p> tags, BUT, there are other <p></p> tags in the HTML, and, inside the <p style="width:800px;"> </p> blocks, I want to extract the title, the link of the title and the body as vars so i can write them to DB.

So, a typical email body would look something like this:
<html><head></head><body><div>
<p><b>Welcome</b>
<p style="width:800px;">
<a href="somelink">title</a>
<font color="red">Some stuff</font>here with lots of <b>normal</b> html tags inbetween
</p>
<p style="width:800px;">
<a href="somelink2">title2</a>
<font color="red">Some more stuff</font>here with lots of <b>normal</b> html tags inbetween
</p>
<p style="width:800px;">
<a href="somelink3">title3</a>
<font color="red">Some more stuff</font>here with lots of <b>normal</b> html tags inbetween
</p>
...
...
...
</div></body></html>

Can someone help me with a script that can iteratively go through all the <p></p> blocks so I can get the link, title and body of each block read into a var?




0
Comment
Question by:psimation
  • 3
  • 2
5 Comments
 
LVL 48

Expert Comment

by:hernst42
ID: 21745787
If it's well form HTML you can use http://www.php.net/manual/en/domdocument.loadhtml.php to parse that html and then convert it to a simpleXML-Object
http://www.php.net/manual/en/function.simplexml-import-dom.php
and then use xpath to search those nodes
http://www.php.net/manual/en/function.simplexml-element-xpath.php
0
 
LVL 17

Author Comment

by:psimation
ID: 21746059
Thanks Hernst42

There will most probably going to be some elements I wil have to manaully remove to make it "well formed html" - but in the event one cannot "clean" it up, is there another way (easy) that you know of?

0
 
LVL 48

Expert Comment

by:hernst42
ID: 21746079
the other option is to write a parser, but that parser can't also handle wrong nested tags. Maybe you need to run http://www.php.net/tidy to make the HTML well formed.
So I don't know an easier solution.
0
 
LVL 17

Author Comment

by:psimation
ID: 21746123
Thx, I'll play with that tomorrow - one more thing:

How does this method handle the "attributes" of tags like <a href="link">xxx</a>

From the example, it looks like it can only "extract" content "between" tags, and not the "attributes" - so how would I get the link ( which is theoretically an attribute of the <a> tag )?
0
 
LVL 48

Accepted Solution

by:
hernst42 earned 500 total points
ID: 21754450
you can convert it back to a domDocument and then output that with all nested tags an attributes.
0

Featured Post

Free Tool: Postgres Monitoring System

A PHP and Perl based system to collect and display usage statistics from PostgreSQL databases.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

679 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question