An Intelligent Script

Hi Experts!

I'd like create a PHP script that extract relevant information from WebPage and i need your help.

The script captures the html code od various similar page from a site (for example all articles page). I define which parts of body (i.e. title of articles) may be extract and i'd like create an algorithm that define, automatically, regular expression for that parts.

I need your suggestions.
ttieroAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

aolXFTCommented:
Unless you show us a sample of the body you want to extract information from we can't really help.

From your question though, I'd consider using XML functions. You might have to run your document through tidy first though to make it XHTML(and therefore XML) Compliant. You can install tidy as a PHP/PECL Extension ( http://pecl.php.net/package/tidy ).

Then use XML Functions to get what you want.
0
ttieroAuthor Commented:
There is PHP classes that transform HTML to XML?
0
aolXFTCommented:
There is an PHP Extension for Tidy. you can check out tidy at http://www.w3.org/People/Raggett/tidy/.  John Coggeshall wrote a PHP Extension for libTidy, or tidyLib, or whatever it is, and submitted it to PECL, check out the above url, or http://www.coggeshall.org/tidy.php

The command line tidy client allows the switch -asxml so I'm sure you can do the same with the PHP extension.

That may however be overkill depending on how complex your document is. Basicly I need a sample.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.