Creating site crawler

Hey guys,

I'm trying to add a search feature to my site, and I've been doing some looking around. Seems to me, that the best way to do this is to do a crawler which will update a MySQL table every now and then for this purpose. There is an ASP version of it, but I have yet to find a PHP version, so if anybody could help me build or get one in PHP, I would be great.

http://www.webwizguide.com/asp/sample_scripts/site_search_script.asp
drakkarnoirAsked:
Who is Participating?
 
frugleConnect With a Mentor Commented:
in fact, why use fread?

have you tried...

$basic = array();

foreach ($array as $key){

      $url = "http://www.products.com/index.php?product_id=".$key;

      $basic[] = strip_tags(implode("",file($url)));

}

# should return an array of basic (text only) content.

Mike
0
 
drakkarnoirAuthor Commented:
I don't think you understood, I want a crawler, that will store my pages contents into a independent table, which will then be searched from. I know how to do it direct if the content is in MySQL, but what if it's the content of the generated pages I want stored.
0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

 
drakkarnoirAuthor Commented:
Like Google or other web crawlers.
0
 
venkateshwarrCommented:

Sorry, disregard my earlier post
0
 
lozlozCommented:
maybe have a look at phpdig: www.phpdig.net

cheers,

loz
0
 
drakkarnoirAuthor Commented:
I guess the best solution is that I code this myself, can anybody get me started on a PHP script that will open each one of my pages with dynamic ID's and have it index and store the text into a variable?
0
 
frugleCommented:
How are your pages created? why don't you store the actual pages in the database and create freetext indexes on them?
displaying the page from the database would be quicker to produce and wouldn't have any non-indexed time between publishing and spidering.

For what it's worth, spiders are probably better written in Perl (ducks and runs for cover)

Mike
0
 
lozlozCommented:
why don't you have a look at the source for phpdig if you want to get started - have a look at the features list to see what you can learn from it:

http://www.phpdig.net/navigation.php?action=doc#toc3

cheers,

loz
0
 
drakkarnoirAuthor Commented:
You better run frugle!! Tehehe

My pages are being generated from MySQL db, but I want to do something like the following:

$array = array("my product id's");
foreach ($array as $key)
fopen("http://www.products.com/index.php?product_id=$key");
fread (?) all the HTML
Get rid of HTML tags
Store only the plain text into a table.
When user searches, well I can do this part.
0
 
frugleCommented:
strip_tags() function will get rid of the HTML

http://uk.php.net/manual/en/function.strip-tags.php

Mike
0
 
drakkarnoirAuthor Commented:
Thanks, worked great.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.