Solved

wget syntax

Posted on 2010-11-11
20
774 Views
Last Modified: 2012-05-10
I'm trying to use the wget to get all .jpg files that are in a directory.

Example:

wget http://domain.com/pictures/*.jpg

that didn't work is there a different way?
0
Comment
Question by:N R
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 7
  • 4
  • +1
20 Comments
 
LVL 19

Expert Comment

by:jools
ID: 34112868
there is but you have to get the directory list first.

try wget http://domain.com/pictures/

it'll probably create an index.html file which you'll need to edit, you can use awk/sed to strip out the files you want then just create a for loop to wget the images files.

0
 
LVL 11

Author Comment

by:N R
ID: 34112881
error 403 forbidden.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 34112912
Please post the actual URL and directory you want to access.  When you say you want to "get" the JPG files, do you mean that you want to copy them to your server?

Thanks, ~Ray
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 11

Author Comment

by:N R
ID: 34112947
I take that back, it only returns the index.html page.  Within that index file I can find the links to the pictures so is there a way to have it parse the file and find links that end with .jpg to download?
0
 
LVL 19

Expert Comment

by:jools
ID: 34112982
yes, you can strip out the jpg filenames with awk and sed and then just loop thru them running wget on each one.
0
 
LVL 11

Author Comment

by:N R
ID: 34113080
@Ray Paseur
There are a lot of them that I would like to use this with, here is an example:
http://www.caught.com/index/sexy-and-funny-gallery-xxiv/

Yes, wget to my server.


@jools
I'm not familiar with awk and sed how would I do this?
0
 
LVL 19

Expert Comment

by:jools
ID: 34113223
come to think of it, last time I did this I may have used a combination of elinks and wget, I think I used elinks -dump, I'll see if I can find my old script.
0
 
LVL 19

Expert Comment

by:jools
ID: 34113321
I'm not sure if us helping you do this will infringe copyright, can you post something where it says you can rip the content from this site?
0
 
LVL 11

Author Comment

by:N R
ID: 34113357
Looking, but I don't show any policies on that site anywhere not even at bottom of site.  The pictures I can go and find on google they are all over the net.
0
 
LVL 19

Expert Comment

by:jools
ID: 34113438
ok, I can see what you're saying but they will be copyrighted to someone and EE policy says we can't help if this is the case.

I'll post a request for a moderator to check this out if they say OK then we can knock something up pretty quick, if they say no then you'll have to sort something yourself.
0
 
LVL 11

Author Comment

by:N R
ID: 34113450
ok thanks
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 34113768
This is an example of a "scraper" script.  Perhaps you can extend the principles shown here to your specific application requirements.  Please get permission for whatever use you plan to make of the images.  Best regards, ~Ray
<?php // RAY_temp_gallitin.php
error_reporting(E_ALL);

// READ THE FOREIGN PAGE
$htm  = file_get_contents('http://www.caught.com/index/sexy-and-funny-gallery-xxiv/');

// SCRAPE THE PAGE DATA WITH THESE KEY ELEMENTS
$dlm1 = "<div id='gallery-1'";
$dlm2 = "<br style='clear: both;' />";
$dlm3 = '<img ';

// REMOVE FRONT AND BACK OF HTML
$arr  = explode($dlm1, $htm);
$htm  = $dlm1 . $arr[1];
$arr  = explode($dlm2, $htm);
$htm  = $arr[0];

// REMOVE WHITESPACE AND COMPRESS
$str  = preg_replace('/\s\s+/', ' ', $htm);

// CREATE AN ITERATOR
$new  = array();
$arr  = explode($dlm3, $str);
foreach ($arr as $thing)
{
    // IMAGE TAGS HAVE WIDTH IN A CONSISTENT PLACE
    if (substr($thing,0,3) != 'wid') continue;

    // REMOVE EXTRANEOUS STUFF - SAVE THE IMAGE URL
    $poz   = strpos($thing, '>');
    $thing = substr($thing, 0, $poz+1);
    $new[] = $dlm3 . $thing;
}

// SHOW THE WORK PRODUCT
print_r($new);

Open in new window

0
 
LVL 11

Author Comment

by:N R
ID: 34113872
@Ray, thanks, what's would the syntax be to execute this .php file with wget?
0
 
LVL 78

Expert Comment

by:arnold
ID: 34114254
You can use wget to retrieve the complete site which will include the image files.

wget -r http://www.domain.com
This will download the site into www.domain.com/

0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 34115573
I would not execute the PHP file with wget.  I would install that script on my server and visit it with a click - like a regular web page.  Example here:

http://www.laprbass.com/RAY_temp_gallitin.php

I'll leave that online for a little while so you can see what it does.
0
 
LVL 19

Expert Comment

by:jools
ID: 34115671
Are we all now contributing to possibly infringing the copyright of this (or other) sites?

I thought I'd raised a valid point with a concern about the copyright of the other site but it seems that points rule.

This was raised with the moderators but it seems that the question had now been answered regardless.
0
 
LVL 78

Expert Comment

by:arnold
ID: 34116080
A publicly accessible site is open to some knuckle head infringing on their copyright by reusing the image. Whether they download an image at a time, or download all of them.

IMHO, give the asker the benefit of a doubt that they want to get the images from a site to which they have access without the need to use FTP or other options. Or they want to only use the images/data that is actually used on the site versus having all the images from the begining of the sites existance and are no longer in use (referenced in any document on the site)
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 34116196
Agree with arnold.  If you simply visit the site, you can sit there at your computer and click to save the images.  Automation does not trump intent.  It's still wise to get permission if you want to do something like harvest images in this automated manner.
0
 
LVL 11

Author Comment

by:N R
ID: 34116397
Thanks for your help Ray.  This really wasn't a malicious request, only trying to learn how it was possible to do and could use this script for many other things.
0
 
LVL 19

Expert Comment

by:jools
ID: 34116584
Gallitin, I never thought your request was malicious. I do believe though we should be careful, some sites have T's & C's which prohibit scraping their data and generally it's not really the done thing to rip a site like that.

However, it's a bit of a pointless arguement now.

That said I thought Rays' code was up to it's usual high standard.
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Since pre-biblical times, humans have sought ways to keep secrets, and share the secrets selectively.  This article explores the ways PHP can be used to hide and encrypt information.
Google Drive is extremely cheap offsite storage, and it's even possible to get extra storage for free for two years.  You can use the free account 15GB, and if you have an Android device..when you install Google Drive for the first time it will give…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question