ttiero
asked on
An Intelligent Script
Hi Experts!
I'd like create a PHP script that extract relevant information from WebPage and i need your help.
The script captures the html code od various similar page from a site (for example all articles page). I define which parts of body (i.e. title of articles) may be extract and i'd like create an algorithm that define, automatically, regular expression for that parts.
I need your suggestions.
I'd like create a PHP script that extract relevant information from WebPage and i need your help.
The script captures the html code od various similar page from a site (for example all articles page). I define which parts of body (i.e. title of articles) may be extract and i'd like create an algorithm that define, automatically, regular expression for that parts.
I need your suggestions.
ASKER
There is PHP classes that transform HTML to XML?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
From your question though, I'd consider using XML functions. You might have to run your document through tidy first though to make it XHTML(and therefore XML) Compliant. You can install tidy as a PHP/PECL Extension ( http://pecl.php.net/package/tidy ).
Then use XML Functions to get what you want.