?
Solved

wget syntax

Posted on 2010-11-11
20
Medium Priority
?
775 Views
Last Modified: 2012-05-10
I'm trying to use the wget to get all .jpg files that are in a directory.

Example:

wget http://domain.com/pictures/*.jpg

that didn't work is there a different way?
0
Comment
Question by:Nathan Riley
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 7
  • 4
  • +1
20 Comments
 
LVL 19

Expert Comment

by:jools
ID: 34112868
there is but you have to get the directory list first.

try wget http://domain.com/pictures/

it'll probably create an index.html file which you'll need to edit, you can use awk/sed to strip out the files you want then just create a for loop to wget the images files.

0
 
LVL 12

Author Comment

by:Nathan Riley
ID: 34112881
error 403 forbidden.
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 34112912
Please post the actual URL and directory you want to access.  When you say you want to "get" the JPG files, do you mean that you want to copy them to your server?

Thanks, ~Ray
0
Simplifying Server Workload Migrations

This use case outlines the migration challenges that organizations face and how the Acronis AnyData Engine supports physical-to-physical (P2P), physical-to-virtual (P2V), virtual to physical (V2P), and cross-virtual (V2V) migration scenarios to address these challenges.

 
LVL 12

Author Comment

by:Nathan Riley
ID: 34112947
I take that back, it only returns the index.html page.  Within that index file I can find the links to the pictures so is there a way to have it parse the file and find links that end with .jpg to download?
0
 
LVL 19

Expert Comment

by:jools
ID: 34112982
yes, you can strip out the jpg filenames with awk and sed and then just loop thru them running wget on each one.
0
 
LVL 12

Author Comment

by:Nathan Riley
ID: 34113080
@Ray Paseur
There are a lot of them that I would like to use this with, here is an example:
http://www.caught.com/index/sexy-and-funny-gallery-xxiv/

Yes, wget to my server.


@jools
I'm not familiar with awk and sed how would I do this?
0
 
LVL 19

Expert Comment

by:jools
ID: 34113223
come to think of it, last time I did this I may have used a combination of elinks and wget, I think I used elinks -dump, I'll see if I can find my old script.
0
 
LVL 19

Expert Comment

by:jools
ID: 34113321
I'm not sure if us helping you do this will infringe copyright, can you post something where it says you can rip the content from this site?
0
 
LVL 12

Author Comment

by:Nathan Riley
ID: 34113357
Looking, but I don't show any policies on that site anywhere not even at bottom of site.  The pictures I can go and find on google they are all over the net.
0
 
LVL 19

Expert Comment

by:jools
ID: 34113438
ok, I can see what you're saying but they will be copyrighted to someone and EE policy says we can't help if this is the case.

I'll post a request for a moderator to check this out if they say OK then we can knock something up pretty quick, if they say no then you'll have to sort something yourself.
0
 
LVL 12

Author Comment

by:Nathan Riley
ID: 34113450
ok thanks
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 34113768
This is an example of a "scraper" script.  Perhaps you can extend the principles shown here to your specific application requirements.  Please get permission for whatever use you plan to make of the images.  Best regards, ~Ray
<?php // RAY_temp_gallitin.php
error_reporting(E_ALL);

// READ THE FOREIGN PAGE
$htm  = file_get_contents('http://www.caught.com/index/sexy-and-funny-gallery-xxiv/');

// SCRAPE THE PAGE DATA WITH THESE KEY ELEMENTS
$dlm1 = "<div id='gallery-1'";
$dlm2 = "<br style='clear: both;' />";
$dlm3 = '<img ';

// REMOVE FRONT AND BACK OF HTML
$arr  = explode($dlm1, $htm);
$htm  = $dlm1 . $arr[1];
$arr  = explode($dlm2, $htm);
$htm  = $arr[0];

// REMOVE WHITESPACE AND COMPRESS
$str  = preg_replace('/\s\s+/', ' ', $htm);

// CREATE AN ITERATOR
$new  = array();
$arr  = explode($dlm3, $str);
foreach ($arr as $thing)
{
    // IMAGE TAGS HAVE WIDTH IN A CONSISTENT PLACE
    if (substr($thing,0,3) != 'wid') continue;

    // REMOVE EXTRANEOUS STUFF - SAVE THE IMAGE URL
    $poz   = strpos($thing, '>');
    $thing = substr($thing, 0, $poz+1);
    $new[] = $dlm3 . $thing;
}

// SHOW THE WORK PRODUCT
print_r($new);

Open in new window

0
 
LVL 12

Author Comment

by:Nathan Riley
ID: 34113872
@Ray, thanks, what's would the syntax be to execute this .php file with wget?
0
 
LVL 79

Expert Comment

by:arnold
ID: 34114254
You can use wget to retrieve the complete site which will include the image files.

wget -r http://www.domain.com
This will download the site into www.domain.com/

0
 
LVL 111

Accepted Solution

by:
Ray Paseur earned 2000 total points
ID: 34115573
I would not execute the PHP file with wget.  I would install that script on my server and visit it with a click - like a regular web page.  Example here:

http://www.laprbass.com/RAY_temp_gallitin.php

I'll leave that online for a little while so you can see what it does.
0
 
LVL 19

Expert Comment

by:jools
ID: 34115671
Are we all now contributing to possibly infringing the copyright of this (or other) sites?

I thought I'd raised a valid point with a concern about the copyright of the other site but it seems that points rule.

This was raised with the moderators but it seems that the question had now been answered regardless.
0
 
LVL 79

Expert Comment

by:arnold
ID: 34116080
A publicly accessible site is open to some knuckle head infringing on their copyright by reusing the image. Whether they download an image at a time, or download all of them.

IMHO, give the asker the benefit of a doubt that they want to get the images from a site to which they have access without the need to use FTP or other options. Or they want to only use the images/data that is actually used on the site versus having all the images from the begining of the sites existance and are no longer in use (referenced in any document on the site)
0
 
LVL 111

Expert Comment

by:Ray Paseur
ID: 34116196
Agree with arnold.  If you simply visit the site, you can sit there at your computer and click to save the images.  Automation does not trump intent.  It's still wise to get permission if you want to do something like harvest images in this automated manner.
0
 
LVL 12

Author Comment

by:Nathan Riley
ID: 34116397
Thanks for your help Ray.  This really wasn't a malicious request, only trying to learn how it was possible to do and could use this script for many other things.
0
 
LVL 19

Expert Comment

by:jools
ID: 34116584
Gallitin, I never thought your request was malicious. I do believe though we should be careful, some sites have T's & C's which prohibit scraping their data and generally it's not really the done thing to rip a site like that.

However, it's a bit of a pointless arguement now.

That said I thought Rays' code was up to it's usual high standard.
0

Featured Post

Get your Conversational Ransomware Defense e‑book

This e-book gives you an insight into the ransomware threat and reviews the fundamentals of top-notch ransomware preparedness and recovery. To help you protect yourself and your organization. The initial infection may be inevitable, so the best protection is to be fully prepared.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many old projects have bad code, but the budget doesn't exist to rewrite the codebase. You can update this code to be safer by introducing contemporary input validation, sanitation, and safer database queries.
This article discusses how to implement server side field validation and display customized error messages to the client.
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
Suggested Courses

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question