wget syntax

I'm trying to use the wget to get all .jpg files that are in a directory.

Example:

wget http://domain.com/pictures/*.jpg

that didn't work is there a different way?
LVL 12
Nathan RileyFounderAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
Ray PaseurConnect With a Mentor Commented:
I would not execute the PHP file with wget.  I would install that script on my server and visit it with a click - like a regular web page.  Example here:

http://www.laprbass.com/RAY_temp_gallitin.php

I'll leave that online for a little while so you can see what it does.
0
 
joolsCommented:
there is but you have to get the directory list first.

try wget http://domain.com/pictures/

it'll probably create an index.html file which you'll need to edit, you can use awk/sed to strip out the files you want then just create a for loop to wget the images files.

0
 
Nathan RileyFounderAuthor Commented:
error 403 forbidden.
0
Build your data science skills into a career

Are you ready to take your data science career to the next step, or break into data science? With Springboard’s Data Science Career Track, you’ll master data science topics, have personalized career guidance, weekly calls with a data science expert, and a job guarantee.

 
Ray PaseurCommented:
Please post the actual URL and directory you want to access.  When you say you want to "get" the JPG files, do you mean that you want to copy them to your server?

Thanks, ~Ray
0
 
Nathan RileyFounderAuthor Commented:
I take that back, it only returns the index.html page.  Within that index file I can find the links to the pictures so is there a way to have it parse the file and find links that end with .jpg to download?
0
 
joolsCommented:
yes, you can strip out the jpg filenames with awk and sed and then just loop thru them running wget on each one.
0
 
Nathan RileyFounderAuthor Commented:
@Ray Paseur
There are a lot of them that I would like to use this with, here is an example:
http://www.caught.com/index/sexy-and-funny-gallery-xxiv/

Yes, wget to my server.


@jools
I'm not familiar with awk and sed how would I do this?
0
 
joolsCommented:
come to think of it, last time I did this I may have used a combination of elinks and wget, I think I used elinks -dump, I'll see if I can find my old script.
0
 
joolsCommented:
I'm not sure if us helping you do this will infringe copyright, can you post something where it says you can rip the content from this site?
0
 
Nathan RileyFounderAuthor Commented:
Looking, but I don't show any policies on that site anywhere not even at bottom of site.  The pictures I can go and find on google they are all over the net.
0
 
joolsCommented:
ok, I can see what you're saying but they will be copyrighted to someone and EE policy says we can't help if this is the case.

I'll post a request for a moderator to check this out if they say OK then we can knock something up pretty quick, if they say no then you'll have to sort something yourself.
0
 
Nathan RileyFounderAuthor Commented:
ok thanks
0
 
Ray PaseurCommented:
This is an example of a "scraper" script.  Perhaps you can extend the principles shown here to your specific application requirements.  Please get permission for whatever use you plan to make of the images.  Best regards, ~Ray
<?php // RAY_temp_gallitin.php
error_reporting(E_ALL);

// READ THE FOREIGN PAGE
$htm  = file_get_contents('http://www.caught.com/index/sexy-and-funny-gallery-xxiv/');

// SCRAPE THE PAGE DATA WITH THESE KEY ELEMENTS
$dlm1 = "<div id='gallery-1'";
$dlm2 = "<br style='clear: both;' />";
$dlm3 = '<img ';

// REMOVE FRONT AND BACK OF HTML
$arr  = explode($dlm1, $htm);
$htm  = $dlm1 . $arr[1];
$arr  = explode($dlm2, $htm);
$htm  = $arr[0];

// REMOVE WHITESPACE AND COMPRESS
$str  = preg_replace('/\s\s+/', ' ', $htm);

// CREATE AN ITERATOR
$new  = array();
$arr  = explode($dlm3, $str);
foreach ($arr as $thing)
{
    // IMAGE TAGS HAVE WIDTH IN A CONSISTENT PLACE
    if (substr($thing,0,3) != 'wid') continue;

    // REMOVE EXTRANEOUS STUFF - SAVE THE IMAGE URL
    $poz   = strpos($thing, '>');
    $thing = substr($thing, 0, $poz+1);
    $new[] = $dlm3 . $thing;
}

// SHOW THE WORK PRODUCT
print_r($new);

Open in new window

0
 
Nathan RileyFounderAuthor Commented:
@Ray, thanks, what's would the syntax be to execute this .php file with wget?
0
 
arnoldCommented:
You can use wget to retrieve the complete site which will include the image files.

wget -r http://www.domain.com
This will download the site into www.domain.com/

0
 
joolsCommented:
Are we all now contributing to possibly infringing the copyright of this (or other) sites?

I thought I'd raised a valid point with a concern about the copyright of the other site but it seems that points rule.

This was raised with the moderators but it seems that the question had now been answered regardless.
0
 
arnoldCommented:
A publicly accessible site is open to some knuckle head infringing on their copyright by reusing the image. Whether they download an image at a time, or download all of them.

IMHO, give the asker the benefit of a doubt that they want to get the images from a site to which they have access without the need to use FTP or other options. Or they want to only use the images/data that is actually used on the site versus having all the images from the begining of the sites existance and are no longer in use (referenced in any document on the site)
0
 
Ray PaseurCommented:
Agree with arnold.  If you simply visit the site, you can sit there at your computer and click to save the images.  Automation does not trump intent.  It's still wise to get permission if you want to do something like harvest images in this automated manner.
0
 
Nathan RileyFounderAuthor Commented:
Thanks for your help Ray.  This really wasn't a malicious request, only trying to learn how it was possible to do and could use this script for many other things.
0
 
joolsCommented:
Gallitin, I never thought your request was malicious. I do believe though we should be careful, some sites have T's & C's which prohibit scraping their data and generally it's not really the done thing to rip a site like that.

However, it's a bit of a pointless arguement now.

That said I thought Rays' code was up to it's usual high standard.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.