Avatar of birwin
birwinFlag for Canada asked on

Web Spider or Scraper that will follow links

I am looking for a web spider or scraper that will allow me to pull data from another site. I need a program that will progressively follow links within a site.

I am hoping there is a PHP application that will do this. Free would be nice, but I am willing to pay for a commercial application if it has the functionality I need.

Thank you.
PHPJavaScript

Avatar of undefined
Last Comment
birwin

8/22/2022 - Mon
IconMan7

HTTrack does it.  It is not PHP, but it is free
hielo

ASKER
birwin

Thank you IconManu7 and hielo for the suggestions. Unfortunately neither of those options will help me.

HTTrack is a Windows application. I need a web based application that I can program to parse data from the sites it visits. My intention is to scape supplier sites for images and copy for a web store, so we don't have to populate it manually. Therefore, I need to be able to use logic to determine the model number, title, description, images and price.

http://www.phpclasses.org/browse/package/4514.html seems to only parse the links on the pages it visits. I need to parse the entire page's data.

Brian
Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy
ASKER CERTIFIED SOLUTION
hielo

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
See how we're fighting big data
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
ASKER
birwin

I downloaded your suggested class. It only parses the urls on the page, not the page's data.

I need something that can parse the body of the page and follow the links on the page.