SheppardDigital
asked on
Pulling product data from ANY website
A client has approached me and wants to create a tool similar to amazons wish list.
Basically, you browse to a website page, add the URL to the tool, and the tool scrapes the website to extract the product name, image, price etc.
Scraping isn't really an issue here, however the client wants it to pull product data from almost any website, which I'm struggling to figure out how to do as each website has a different structure to its HTML.
I'm guessing amazons wish list is using some kind of AI that's been trained to accurately determine which data on the page it needs.
I'm thinking this is a huge task for a single developer, but wondered if I was maybe overthinking this and if there was already a solution available?
Basically, you browse to a website page, add the URL to the tool, and the tool scrapes the website to extract the product name, image, price etc.
Scraping isn't really an issue here, however the client wants it to pull product data from almost any website, which I'm struggling to figure out how to do as each website has a different structure to its HTML.
I'm guessing amazons wish list is using some kind of AI that's been trained to accurately determine which data on the page it needs.
I'm thinking this is a huge task for a single developer, but wondered if I was maybe overthinking this and if there was already a solution available?
Some easy options....
You could use pintrest api's and perhaps capture peoples pins by having your app send to a special user's pinboard called "wish list". Then you can grab that wish list via the user id and board id through xml.
Limit to whatever is on one api like amazon http://docs.aws.amazon.com/AWSECommerceService/latest/DG/CHAP_FindingItemstoBuy.html and build your app around that.
You could use pintrest api's and perhaps capture peoples pins by having your app send to a special user's pinboard called "wish list". Then you can grab that wish list via the user id and board id through xml.
Limit to whatever is on one api like amazon http://docs.aws.amazon.com/AWSECommerceService/latest/DG/CHAP_FindingItemstoBuy.html and build your app around that.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If your client is happy, we're happy. But I would test that service very, very carefully (and check the terms of use) before I relied on it for more than hobby applications.
ASKER
This was the more suited answer
Try this Google search: https://www.google.com/?q=wish+list+maker
One problem you may encounter (and it will become more prevalent over time) is that many sites do not publish product information in HTML any more. They use a placeholder and use jQuery/ AJAX to load the information directly into the DOM. The reason for doing this is to prevent 'bots from scraping data out of their HTML.