• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 368
  • Last Modified:

How to extract links from Amazon.com search results in Java

Hello All,

I am trying to extract urls for products with a certain brand, say "Toshiba" after searching for them in Amazon.com.

Steps:

1. Got to Amazon.com
2. Click on Electronics, search by brand and click on "Toshiba". This lists all the products in Toshiba
3. enter a specific tv model
4. extract url

I am using a crawler for steps 1 and 2 to gather all the urls. For steps 3 and 4, I am thinking about using lucene or a data structure to grab a specific url out of it.

Any suggestions about which GPL licensed crawler to use and technique to parse the search results?

Please let me know. Sample Code would be helpful as well.
0
pkrish80
Asked:
pkrish80
  • 3
  • 3
1 Solution
 
CEHJCommented:
Pretty sure Amazon has an api that you can use to avoid scraping
0
 
pkrish80Author Commented:
I wanted to add the ability to search for buy.com, eBay and other sites as well
0
 
CEHJCommented:
Try to use available apis wherever possible
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
pkrish80Author Commented:
ok. What if I need to do this for buy.com? I don't think there are any apis available for buy.com
0
 
CEHJCommented:
Well then you'd need to scrape it. Try HttpUnit
0
 
Derek JensenCommented:
Yeah, or www.screen-scraper.com is may fav...it lets you do everything all within itself...but I'm not sure what your end goal is, so it may not suit your needs. :-)
0
 
pkrish80Author Commented:
Using a combination of APIs and scraping
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now