I am trying to extract urls for products with a certain brand, say "Toshiba" after searching for them in Amazon.com.
1. Got to Amazon.com
2. Click on Electronics, search by brand and click on "Toshiba". This lists all the products in Toshiba
3. enter a specific tv model
4. extract url
I am using a crawler for steps 1 and 2 to gather all the urls. For steps 3 and 4, I am thinking about using lucene or a data structure to grab a specific url out of it.
Any suggestions about which GPL licensed crawler to use and technique to parse the search results?
Please let me know. Sample Code would be helpful as well.