Link to home
Start Free TrialLog in
Avatar of valleytech
valleytechFlag for United States of America

asked on

massive URL crawling with PHP

Hi
I'm currently working on a self-made project that will require to take hundreds of map servers (wms server) links from sources all around the internet. I have been able to find most of the source and the next task is to crawl all of the links on each website and parse them a text file, one line for each links.

i've been looking around for hints and information but without luck.

can you please guide me to the right direction?.i.e which php functions/trick/techniques will work out in this case. I'm not a web designer or anything, I'm still at the very beginning of self-study ;)

thanks so much!!
ASKER CERTIFIED SOLUTION
Avatar of Richard Quadling
Richard Quadling
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of ddrudik
>the next task is to crawl all of the links on each website and parse them a text file, one line for each links.
Please be a little more specific of your goal (output etc.) and where this list of links would come from (a text file etc.?).  For example, do you just want to get the page output of a link in your list?  If so, what do you want to do with that page output after that?  Also, it would help if you supplied some actual links to test with.
Avatar of valleytech

ASKER

thanksss so mcuh!!!
For example:
http://www.skylab-mobilesystems.com/en/wms_serverlist.html
as you see, this site contains lots of links, and i'd like to crawl all those links and parse them into a text file in which each line has 1 link.
will it  be possible?

thanks!!
to be exact
the web above has alot of links to various URL and I'd like to crawl those URL and parse all of them to a txt file.

thanksss
I'd probably recommend something like Teleport Pro from TennysonMax

http://www.tenmax.com/teleport/pro/home.htm

This deals with a LOT of the things you are asking for. Also as it is threaded, you can do more than 1 URL at a time.
thanks!!
but...really..i'd like to use php ;)
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
thanks!!!!
you guys are great!