Solved

extracting data from HTML email

Posted on 2008-06-09
5
256 Views
Last Modified: 2008-07-02
HI

I'm trying to "extract" HTML "blocks" from an HTML e-mail so I can add the data into a DB.

I've already managed to write a script that successfully polls my imap acount for the message according to subject, and it then reads the body into $body.

The data I'm looking for are always "encapsulated" between <p style="width:800px;"> </p> tags, BUT, there are other <p></p> tags in the HTML, and, inside the <p style="width:800px;"> </p> blocks, I want to extract the title, the link of the title and the body as vars so i can write them to DB.

So, a typical email body would look something like this:
<html><head></head><body><div>
<p><b>Welcome</b>
<p style="width:800px;">
<a href="somelink">title</a>
<font color="red">Some stuff</font>here with lots of <b>normal</b> html tags inbetween
</p>
<p style="width:800px;">
<a href="somelink2">title2</a>
<font color="red">Some more stuff</font>here with lots of <b>normal</b> html tags inbetween
</p>
<p style="width:800px;">
<a href="somelink3">title3</a>
<font color="red">Some more stuff</font>here with lots of <b>normal</b> html tags inbetween
</p>
...
...
...
</div></body></html>

Can someone help me with a script that can iteratively go through all the <p></p> blocks so I can get the link, title and body of each block read into a var?




0
Comment
Question by:psimation
  • 3
  • 2
5 Comments
 
LVL 48

Expert Comment

by:hernst42
ID: 21745787
If it's well form HTML you can use http://www.php.net/manual/en/domdocument.loadhtml.php to parse that html and then convert it to a simpleXML-Object
http://www.php.net/manual/en/function.simplexml-import-dom.php
and then use xpath to search those nodes
http://www.php.net/manual/en/function.simplexml-element-xpath.php
0
 
LVL 17

Author Comment

by:psimation
ID: 21746059
Thanks Hernst42

There will most probably going to be some elements I wil have to manaully remove to make it "well formed html" - but in the event one cannot "clean" it up, is there another way (easy) that you know of?

0
 
LVL 48

Expert Comment

by:hernst42
ID: 21746079
the other option is to write a parser, but that parser can't also handle wrong nested tags. Maybe you need to run http://www.php.net/tidy to make the HTML well formed.
So I don't know an easier solution.
0
 
LVL 17

Author Comment

by:psimation
ID: 21746123
Thx, I'll play with that tomorrow - one more thing:

How does this method handle the "attributes" of tags like <a href="link">xxx</a>

From the example, it looks like it can only "extract" content "between" tags, and not the "attributes" - so how would I get the link ( which is theoretically an attribute of the <a> tag )?
0
 
LVL 48

Accepted Solution

by:
hernst42 earned 500 total points
ID: 21754450
you can convert it back to a domDocument and then output that with all nested tags an attributes.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Deprecated and Headed for the Dustbin By now, you have probably heard that some PHP features, while convenient, can also cause PHP security problems.  This article discusses one of those, called register_globals.  It is a thing you do not want.  …
Build an array called $myWeek which will hold the array elements Today, Yesterday and then builds up the rest of the week by the name of the day going back 1 week.   (CODE) (CODE) Then you just need to pass your date to the function. If i…
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

929 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now