asked on

PHP Crawler Sourceforge

Hello,

I am attempting to implement an open source PHP application available from source Forge (http://sourceforge.net/projects/php-crawler).

Unfortunately there is not much documentation however I have managed to get the crawler working to a certain extent-I can target a web page and return all of the content and output to a new html file.

However the MYSQL database table used by the crawler is empty at the end of processing. I do not need the whole HTML file-I just want one <DIV> section. I thought I could do this using the database contents however I am open to other suggestions.

I have tried using a string function but this returns a blank file so either it is incorrectly written or I cannot use the string function on the HTML file.

I have attached a copy of pro.php which successfully returns the whole page and pro1.php (returns nothing). I have also attached a copy of index2.php which calls pro.php and pro1.php with crawl address.

The difference between files (where I have attempted to strip out the div) is displayed below-
$data = $usendid;
$string = between('<div id=section c1>', '</div>', $data);

function between($start, $end, $source) {
$s = strpos($source, $start) + strlen($start);
return substr($source, $s, strpos($source, $end, $s) - $s);

Can anyone advise me either how I can fix my string function to pull out the required section or how I could use the database table to complete the same thing? does anyone have any more detailed documentation for PHP crawler?

Thanks
index2.php
pro.php
pro1.php

ASKER CERTIFIED SOLUTION

mpickreign

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

javaftper

ASKER

thanks. preg_match works better however i'm having to take and re-create the whole file rather than creating the div in one pass possibly using the MYSQL DB.
Anyone got any docs for php crawler?

SOLUTION

ahmad_alinat