Solved

wget syntax

Posted on 2010-11-11
20
771 Views
Last Modified: 2012-05-10
I'm trying to use the wget to get all .jpg files that are in a directory.

Example:

wget http://domain.com/pictures/*.jpg

that didn't work is there a different way?
0
Comment
Question by:N R
  • 7
  • 7
  • 4
  • +1
20 Comments
 
LVL 19

Expert Comment

by:jools
ID: 34112868
there is but you have to get the directory list first.

try wget http://domain.com/pictures/

it'll probably create an index.html file which you'll need to edit, you can use awk/sed to strip out the files you want then just create a for loop to wget the images files.

0
 
LVL 11

Author Comment

by:N R
ID: 34112881
error 403 forbidden.
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 34112912
Please post the actual URL and directory you want to access.  When you say you want to "get" the JPG files, do you mean that you want to copy them to your server?

Thanks, ~Ray
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 
LVL 11

Author Comment

by:N R
ID: 34112947
I take that back, it only returns the index.html page.  Within that index file I can find the links to the pictures so is there a way to have it parse the file and find links that end with .jpg to download?
0
 
LVL 19

Expert Comment

by:jools
ID: 34112982
yes, you can strip out the jpg filenames with awk and sed and then just loop thru them running wget on each one.
0
 
LVL 11

Author Comment

by:N R
ID: 34113080
@Ray Paseur
There are a lot of them that I would like to use this with, here is an example:
http://www.caught.com/index/sexy-and-funny-gallery-xxiv/

Yes, wget to my server.


@jools
I'm not familiar with awk and sed how would I do this?
0
 
LVL 19

Expert Comment

by:jools
ID: 34113223
come to think of it, last time I did this I may have used a combination of elinks and wget, I think I used elinks -dump, I'll see if I can find my old script.
0
 
LVL 19

Expert Comment

by:jools
ID: 34113321
I'm not sure if us helping you do this will infringe copyright, can you post something where it says you can rip the content from this site?
0
 
LVL 11

Author Comment

by:N R
ID: 34113357
Looking, but I don't show any policies on that site anywhere not even at bottom of site.  The pictures I can go and find on google they are all over the net.
0
 
LVL 19

Expert Comment

by:jools
ID: 34113438
ok, I can see what you're saying but they will be copyrighted to someone and EE policy says we can't help if this is the case.

I'll post a request for a moderator to check this out if they say OK then we can knock something up pretty quick, if they say no then you'll have to sort something yourself.
0
 
LVL 11

Author Comment

by:N R
ID: 34113450
ok thanks
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 34113768
This is an example of a "scraper" script.  Perhaps you can extend the principles shown here to your specific application requirements.  Please get permission for whatever use you plan to make of the images.  Best regards, ~Ray
<?php // RAY_temp_gallitin.php
error_reporting(E_ALL);

// READ THE FOREIGN PAGE
$htm  = file_get_contents('http://www.caught.com/index/sexy-and-funny-gallery-xxiv/');

// SCRAPE THE PAGE DATA WITH THESE KEY ELEMENTS
$dlm1 = "<div id='gallery-1'";
$dlm2 = "<br style='clear: both;' />";
$dlm3 = '<img ';

// REMOVE FRONT AND BACK OF HTML
$arr  = explode($dlm1, $htm);
$htm  = $dlm1 . $arr[1];
$arr  = explode($dlm2, $htm);
$htm  = $arr[0];

// REMOVE WHITESPACE AND COMPRESS
$str  = preg_replace('/\s\s+/', ' ', $htm);

// CREATE AN ITERATOR
$new  = array();
$arr  = explode($dlm3, $str);
foreach ($arr as $thing)
{
    // IMAGE TAGS HAVE WIDTH IN A CONSISTENT PLACE
    if (substr($thing,0,3) != 'wid') continue;

    // REMOVE EXTRANEOUS STUFF - SAVE THE IMAGE URL
    $poz   = strpos($thing, '>');
    $thing = substr($thing, 0, $poz+1);
    $new[] = $dlm3 . $thing;
}

// SHOW THE WORK PRODUCT
print_r($new);

Open in new window

0
 
LVL 11

Author Comment

by:N R
ID: 34113872
@Ray, thanks, what's would the syntax be to execute this .php file with wget?
0
 
LVL 77

Expert Comment

by:arnold
ID: 34114254
You can use wget to retrieve the complete site which will include the image files.

wget -r http://www.domain.com
This will download the site into www.domain.com/

0
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 34115573
I would not execute the PHP file with wget.  I would install that script on my server and visit it with a click - like a regular web page.  Example here:

http://www.laprbass.com/RAY_temp_gallitin.php

I'll leave that online for a little while so you can see what it does.
0
 
LVL 19

Expert Comment

by:jools
ID: 34115671
Are we all now contributing to possibly infringing the copyright of this (or other) sites?

I thought I'd raised a valid point with a concern about the copyright of the other site but it seems that points rule.

This was raised with the moderators but it seems that the question had now been answered regardless.
0
 
LVL 77

Expert Comment

by:arnold
ID: 34116080
A publicly accessible site is open to some knuckle head infringing on their copyright by reusing the image. Whether they download an image at a time, or download all of them.

IMHO, give the asker the benefit of a doubt that they want to get the images from a site to which they have access without the need to use FTP or other options. Or they want to only use the images/data that is actually used on the site versus having all the images from the begining of the sites existance and are no longer in use (referenced in any document on the site)
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 34116196
Agree with arnold.  If you simply visit the site, you can sit there at your computer and click to save the images.  Automation does not trump intent.  It's still wise to get permission if you want to do something like harvest images in this automated manner.
0
 
LVL 11

Author Comment

by:N R
ID: 34116397
Thanks for your help Ray.  This really wasn't a malicious request, only trying to learn how it was possible to do and could use this script for many other things.
0
 
LVL 19

Expert Comment

by:jools
ID: 34116584
Gallitin, I never thought your request was malicious. I do believe though we should be careful, some sites have T's & C's which prohibit scraping their data and generally it's not really the done thing to rip a site like that.

However, it's a bit of a pointless arguement now.

That said I thought Rays' code was up to it's usual high standard.
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

These days socially coordinated efforts have turned into a critical requirement for enterprises.
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question