Solved

An Intelligent Script

Posted on 2003-12-05
5
384 Views
Last Modified: 2008-03-06
Hi Experts!

I'd like create a PHP script that extract relevant information from WebPage and i need your help.

The script captures the html code od various similar page from a site (for example all articles page). I define which parts of body (i.e. title of articles) may be extract and i'd like create an algorithm that define, automatically, regular expression for that parts.

I need your suggestions.
0
Comment
Question by:ttiero
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
5 Comments
 
LVL 6

Expert Comment

by:aolXFT
ID: 9883170
Unless you show us a sample of the body you want to extract information from we can't really help.

From your question though, I'd consider using XML functions. You might have to run your document through tidy first though to make it XHTML(and therefore XML) Compliant. You can install tidy as a PHP/PECL Extension ( http://pecl.php.net/package/tidy ).

Then use XML Functions to get what you want.
0
 

Author Comment

by:ttiero
ID: 9883317
There is PHP classes that transform HTML to XML?
0
 
LVL 6

Accepted Solution

by:
aolXFT earned 200 total points
ID: 9883378
There is an PHP Extension for Tidy. you can check out tidy at http://www.w3.org/People/Raggett/tidy/.  John Coggeshall wrote a PHP Extension for libTidy, or tidyLib, or whatever it is, and submitted it to PECL, check out the above url, or http://www.coggeshall.org/tidy.php

The command line tidy client allows the switch -asxml so I'm sure you can do the same with the PHP extension.

That may however be overkill depending on how complex your document is. Basicly I need a sample.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
This article discusses how to implement server side field validation and display customized error messages to the client.
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question