<div>
Pretty sure Amazon has an api that you can use to avoid scraping
</div>


Pretty sure Amazon has an api that you can use to avoid scraping

<div>
I wanted to add the ability to search for buy.com, eBay and other sites as well
</div>


I wanted to add the ability to search for buy.com, eBay and other sites as well

<div>
ok. What if I need to do this for buy.com? I don't think there are any apis available for buy.com
</div>


ok. What if I need to do this for buy.com? I don't think there are any apis available for buy.com

<div>
Well then you'd need to scrape it. Try HttpUnit
</div>


Well then you'd need to scrape it. Try HttpUnit

<div>
Yeah, or <a href="http://www.screen-scraper.com" rel="ugc">www.screen-scraper.com</a>&nbsp;is may fav...it lets you do everything all within itself...but I'm not sure what your end goal is, so it may not suit your needs. :-)
</div>


Yeah, or www.screen-scraper.com [http://www.screen-scraper.com] is may fav...it lets you do everything all within itself...but I'm not sure what your end goal is, so it may not suit your needs. :-)

<div>
Using a combination of APIs and scraping
</div>


<div>
<div class="content wysiwyg-content">
Hello All,<br />
<br />
I am trying to extract urls for products with a certain brand, say &quot;Toshiba&quot; after searching for them in Amazon.com. <br />
<br />
Steps:<br />
<br />
1. Got to Amazon.com<br />
2. Click on Electronics, search by brand and click on &quot;Toshiba&quot;. This lists all the products in Toshiba<br />
3. enter a specific tv model<br />
4. extract url<br />
<br />
I am using a crawler for steps 1 and 2 to gather all the urls. For steps 3 and 4, I am thinking about using lucene or a data structure to grab a specific url out of it. <br />
<br />
Any suggestions about which GPL licensed crawler to use and technique to parse the search results?<br />
<br />
Please let me know. Sample Code would be helpful as well.
</div>
</div>


Hello All,

I am trying to extract urls for products with a certain brand, say "Toshiba" after searching for them in Amazon.com. 

Steps:

1. Got to Amazon.com
2. Click on Electronics, search by brand and click on "Toshiba". This lists all the products in Toshiba
3. enter a specific tv model
4. extract url

I am using a crawler for steps 1 and 2 to gather all the urls. For steps 3 and 4, I am thinking about using lucene or a data structure to grab a specific url out of it. 

Any suggestions about which GPL licensed crawler to use and technique to parse the search results?

Please let me know. Sample Code would be helpful as well.

How to extract links from Amazon.com search results in Java

Java is a platform-independent, object-oriented programming language and run-time environment, designed to have as few implementation dependencies as possible such that developers can write one set of code across all platforms using libraries. Most devices will not run Java natively, and require a run-time component to be installed in order to execute a Java program.

Java

A regular expression ("regex") is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations. Regular expression processors are found in several search engines, search and replace dialogs of several word processors and text editors, and in the command lines of text processing utilities, such as sed and AWK. Many programming languages provide regular expression capabilities, some built-in, for example Perl, JavaScript, Ruby, AWK, and Tcl, and others via a standard library, for example .NET languages, Java, Python and C++ (since C++11). Most other languages offer regular expressions via a library.

Regular Expressions

A blog is a discussion or informational site consisting of discrete entries ("posts") typically displayed in reverse chronological order. Blogs can be the work of a single individual but more recently "multi-author blogs" (MABs) have developed, with posts written by large numbers of authors and professionally edited and "microblogging" systems help integrate blogs into societal newstreams.