Link to home
Start Free TrialLog in
Avatar of pkrish80
pkrish80

asked on

How to extract links from Amazon.com search results in Java

Hello All,

I am trying to extract urls for products with a certain brand, say "Toshiba" after searching for them in Amazon.com.

Steps:

1. Got to Amazon.com
2. Click on Electronics, search by brand and click on "Toshiba". This lists all the products in Toshiba
3. enter a specific tv model
4. extract url

I am using a crawler for steps 1 and 2 to gather all the urls. For steps 3 and 4, I am thinking about using lucene or a data structure to grab a specific url out of it.

Any suggestions about which GPL licensed crawler to use and technique to parse the search results?

Please let me know. Sample Code would be helpful as well.
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Pretty sure Amazon has an api that you can use to avoid scraping
Avatar of pkrish80
pkrish80

ASKER

I wanted to add the ability to search for buy.com, eBay and other sites as well
ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ok. What if I need to do this for buy.com? I don't think there are any apis available for buy.com
Well then you'd need to scrape it. Try HttpUnit
Yeah, or www.screen-scraper.com is may fav...it lets you do everything all within itself...but I'm not sure what your end goal is, so it may not suit your needs. :-)
Using a combination of APIs and scraping