Solved

Creating site crawler

Posted on 2004-03-21
13
603 Views
Last Modified: 2008-01-16
Hey guys,

I'm trying to add a search feature to my site, and I've been doing some looking around. Seems to me, that the best way to do this is to do a crawler which will update a MySQL table every now and then for this purpose. There is an ASP version of it, but I have yet to find a PHP version, so if anybody could help me build or get one in PHP, I would be great.

http://www.webwizguide.com/asp/sample_scripts/site_search_script.asp
0
Comment
Question by:drakkarnoir
  • 5
  • 3
  • 3
  • +1
13 Comments
 
LVL 12

Expert Comment

by:venkateshwarr
ID: 10646905
0
 

Author Comment

by:drakkarnoir
ID: 10646931
I don't think you understood, I want a crawler, that will store my pages contents into a independent table, which will then be searched from. I know how to do it direct if the content is in MySQL, but what if it's the content of the generated pages I want stored.
0
 

Author Comment

by:drakkarnoir
ID: 10646932
Like Google or other web crawlers.
0
Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

 
LVL 12

Expert Comment

by:venkateshwarr
ID: 10646936
0
 
LVL 12

Expert Comment

by:venkateshwarr
ID: 10646942

Sorry, disregard my earlier post
0
 
LVL 13

Expert Comment

by:lozloz
ID: 10648537
maybe have a look at phpdig: www.phpdig.net

cheers,

loz
0
 

Author Comment

by:drakkarnoir
ID: 10648852
I guess the best solution is that I code this myself, can anybody get me started on a PHP script that will open each one of my pages with dynamic ID's and have it index and store the text into a variable?
0
 
LVL 10

Expert Comment

by:frugle
ID: 10648937
How are your pages created? why don't you store the actual pages in the database and create freetext indexes on them?
displaying the page from the database would be quicker to produce and wouldn't have any non-indexed time between publishing and spidering.

For what it's worth, spiders are probably better written in Perl (ducks and runs for cover)

Mike
0
 
LVL 13

Expert Comment

by:lozloz
ID: 10648953
why don't you have a look at the source for phpdig if you want to get started - have a look at the features list to see what you can learn from it:

http://www.phpdig.net/navigation.php?action=doc#toc3

cheers,

loz
0
 

Author Comment

by:drakkarnoir
ID: 10649649
You better run frugle!! Tehehe

My pages are being generated from MySQL db, but I want to do something like the following:

$array = array("my product id's");
foreach ($array as $key)
fopen("http://www.products.com/index.php?product_id=$key");
fread (?) all the HTML
Get rid of HTML tags
Store only the plain text into a table.
When user searches, well I can do this part.
0
 
LVL 10

Expert Comment

by:frugle
ID: 10649917
strip_tags() function will get rid of the HTML

http://uk.php.net/manual/en/function.strip-tags.php

Mike
0
 
LVL 10

Accepted Solution

by:
frugle earned 500 total points
ID: 10649959
in fact, why use fread?

have you tried...

$basic = array();

foreach ($array as $key){

      $url = "http://www.products.com/index.php?product_id=".$key;

      $basic[] = strip_tags(implode("",file($url)));

}

# should return an array of basic (text only) content.

Mike
0
 

Author Comment

by:drakkarnoir
ID: 10674682
Thanks, worked great.
0

Featured Post

Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Build an array called $myWeek which will hold the array elements Today, Yesterday and then builds up the rest of the week by the name of the day going back 1 week.   (CODE) (CODE) Then you just need to pass your date to the function. If i…
This article discusses how to create an extensible mechanism for linked drop downs.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.

785 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question