turbojournal
asked on
Extension for downloading all search results images?
Is there an extension and or application for Windows or Mac that allows a user to download all images from a websites store search results and all product images from within each product from the search results?
An example would be:
Planning to purchase a few rugs but the user doesn't want to have to go to each product based on the search result and each image relating to that product right click each image and save to their computer locally. This takes hours for the user to do and becomes daunting.
I have uncovered quite a few image downloaders for webstites both on Mac, PC, Firefox, Chrome, and a few others but the most I have gotten out of one is getting the downloader to download all images from a specific product page, but not all the products images from the total search results.
An example would be:
Planning to purchase a few rugs but the user doesn't want to have to go to each product based on the search result and each image relating to that product right click each image and save to their computer locally. This takes hours for the user to do and becomes daunting.
I have uncovered quite a few image downloaders for webstites both on Mac, PC, Firefox, Chrome, and a few others but the most I have gotten out of one is getting the downloader to download all images from a specific product page, but not all the products images from the total search results.
firefox browser with downThemAll extension
ASKER
I've already tried that and it doesn't do specifically what I've stated earlier.
I doubt that you can do this with search results, because the URL associated with the search results is typically unrelated to the items that the search found, although that could vary depending on the tool used to build the website. For example, here's a Drupal-based Oriental rug site:
http://www.peerrugs.com/
If you do a search for, let's say, Sarouk, you get this URL:
http://www.peerrugs.com/search/node/sarouk
However, if you use a website download tool, it won't find the content related to the URL above, as it is actually in locations such as:
http://www.peerrugs.com/rug/sarouk-rug
http://www.peerrugs.com/rug/sarouk-feraghan-carpet-0
http://www.peerrugs.com/rug/sarouk-mahajiran-carpet
If you want to get all images (not results of a search), that's doable. I use HTTrack (free!) to download websites:
http://www.httrack.com/
I've never tried to get just images, but I think that would be possible by using its include filter (a plus sign) to include image file types and its exclude filter (a minus sign) to exclude other stuff:
As you can see in the Scan Rules tab of the options dialog above, it has a pre-configured check-box for gif, jpg, png, tif, and bmp files — include those.
I recently used HTTrack to download the Oriental rugs site mentioned above. It worked well and I got the rug images (mostly JPGs, some PNGs). I suppose I could have used the include/exclude feature in the Scan Rules tab to get the just the rug images, but I didn't try that — I ran with the default, which is this:
+*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
I didn't realize until just now that the default doesn't include bmp and tif files, but for most websites, png/gif/jpg will get the images, which is no doubt why HTTrack made them the default. Regards, Joe
http://www.peerrugs.com/
If you do a search for, let's say, Sarouk, you get this URL:
http://www.peerrugs.com/search/node/sarouk
However, if you use a website download tool, it won't find the content related to the URL above, as it is actually in locations such as:
http://www.peerrugs.com/rug/sarouk-rug
http://www.peerrugs.com/rug/sarouk-feraghan-carpet-0
http://www.peerrugs.com/rug/sarouk-mahajiran-carpet
If you want to get all images (not results of a search), that's doable. I use HTTrack (free!) to download websites:
http://www.httrack.com/
I've never tried to get just images, but I think that would be possible by using its include filter (a plus sign) to include image file types and its exclude filter (a minus sign) to exclude other stuff:
As you can see in the Scan Rules tab of the options dialog above, it has a pre-configured check-box for gif, jpg, png, tif, and bmp files — include those.
I recently used HTTrack to download the Oriental rugs site mentioned above. It worked well and I got the rug images (mostly JPGs, some PNGs). I suppose I could have used the include/exclude feature in the Scan Rules tab to get the just the rug images, but I didn't try that — I ran with the default, which is this:
+*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
I didn't realize until just now that the default doesn't include bmp and tif files, but for most websites, png/gif/jpg will get the images, which is no doubt why HTTrack made them the default. Regards, Joe
ASKER
I tested the app with http://www.rugstudio.com but am quickly getting errors and the process halts without photos. I'm trying to get all images from each item for sale.
When I saw your last post, I kicked off an HTTrack run on <http://www.rugstudio.com/>. It has been running for 28 minutes. So far, it has downloaded 584 JPGs. Here's just the first page of hits on a search for <*.jpg>:
I set the Scan Rules to:
+*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
In the Limits tab, I left the "Maximum mirroring depth" blank and set the Maximum external depth" to 0. Regards, Joe
I set the Scan Rules to:
+*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
In the Limits tab, I left the "Maximum mirroring depth" blank and set the Maximum external depth" to 0. Regards, Joe
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Bingo, that was it. Thank you!
ASKER
All images started downloading.
You're welcome! Glad to hear that it's working for you. Regards, Joe
ASKER
Thanks Joe. One last thing. I saw how I can say to just download filters but what command do I type in to "exclude everything but .jpgs over 300x300" as an example?
There's no way that I'm aware of to exclude files based on resolution, such as those over 300x300. You may be able to achieve what you want by utilizing the option "Max size of any non-HTML file" in the Limits tab. Regards, Joe
ASKER
Thanks Joe.
You're welcome.