Link to home
Start Free TrialLog in
Avatar of blueantz_cpw
blueantz_cpw

asked on

Scrape data from a website using php

I need to scrape data from websiteA and then upload it to websiteB's homepage.
websiteA urls: http://www.joecigar.com and  http://dailycigardeal.com

The data being scraped are daily deals and the content changes daily. This website currently does the exact same thing I need: http://www.cigarstash.com/cigar-deals.php

Please help. I hope that you can provide me with the code since I only have idea of web scraping and especially regex codes used for selective scraping.

Thank you!!!
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

The general design for something like this is to read the web page with file_get_contents() or CURL, then tease the HTML apart.  You would start your work with "view source" and build your scripts based on what you see in the HTML.  Every such application is a bespoke script, and they can break without notice at any time, so good error checking is vital.

Just looking at the site, I found this: http://www.joecigar.com/jcRss.asp and if that has the information you need, it will be much more stable and easier to process.  You can use the SimpleXML class to process RSS feeds.

Best regards, ~Ray
Avatar of blueantz_cpw
blueantz_cpw

ASKER

Rss feed didn't seem to work. Hoping that someone can provide me with the code since I only have idea of web scraping. Quick learner though.

Thanks.
Well, as I wrote, this is a bespoke web application script, and it can break without notice at any time, so good error checking is vital.  You might want to consider hiring a professional developer to help you with this project.  It is not easy or something that is a project for a quick study.  I would be glad to help you if I could in a few hundred lines of code but that is not an option here.  You have to study string manipulation in some detail to be successful.  Or simply acknowledge that time is money and make the conversion between time and money to hire a developer.

Please consider the resources you can find when you make a Google search for "PHP developer."

Good luck, ~Ray
ASKER CERTIFIED SOLUTION
Avatar of Ray Paseur
Ray Paseur
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Ray,

That's exactly what I need. How do I incorporate this into my Wordpress website? I use the Thesis Theme and I put the code into the custom_functions.php file, but it didn't work.

Any ideas??
Yes, I do not know WP well enough to advise you, but I have an idea.  There is a Wordpress Zone here at EE.  Use the Request Attention link up near the original question and ask a moderator to add this question to that zone.  Since it is the weekend it may take a day or two to get that part of the question answered.  Or you could just ask a new question in the WP and include a link to this one.  You can ask in up to three zones at a time.

BTW, I didn't reply to your "hire me" email because we can use the Atom feed and frankly, this is just not something that I would expect to get paid for.  Too small a job!

Best regards, ~Ray
I'm getting the dreaded message:Warning: file_get_contents(http://www.joecigar.com/jcRss.asp) [function.file-get-contents]: failed to open stream: HTTP request failed!

Is there a way to test the file_get_contents() function first and then scrape the data?