Solved

wget syntax

Posted on 2010-11-11
20
773 Views
Last Modified: 2012-05-10
I'm trying to use the wget to get all .jpg files that are in a directory.

Example:

wget http://domain.com/pictures/*.jpg

that didn't work is there a different way?
0
Comment
Question by:N R
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 7
  • 4
  • +1
20 Comments
 
LVL 19

Expert Comment

by:jools
ID: 34112868
there is but you have to get the directory list first.

try wget http://domain.com/pictures/

it'll probably create an index.html file which you'll need to edit, you can use awk/sed to strip out the files you want then just create a for loop to wget the images files.

0
 
LVL 11

Author Comment

by:N R
ID: 34112881
error 403 forbidden.
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 34112912
Please post the actual URL and directory you want to access.  When you say you want to "get" the JPG files, do you mean that you want to copy them to your server?

Thanks, ~Ray
0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 
LVL 11

Author Comment

by:N R
ID: 34112947
I take that back, it only returns the index.html page.  Within that index file I can find the links to the pictures so is there a way to have it parse the file and find links that end with .jpg to download?
0
 
LVL 19

Expert Comment

by:jools
ID: 34112982
yes, you can strip out the jpg filenames with awk and sed and then just loop thru them running wget on each one.
0
 
LVL 11

Author Comment

by:N R
ID: 34113080
@Ray Paseur
There are a lot of them that I would like to use this with, here is an example:
http://www.caught.com/index/sexy-and-funny-gallery-xxiv/

Yes, wget to my server.


@jools
I'm not familiar with awk and sed how would I do this?
0
 
LVL 19

Expert Comment

by:jools
ID: 34113223
come to think of it, last time I did this I may have used a combination of elinks and wget, I think I used elinks -dump, I'll see if I can find my old script.
0
 
LVL 19

Expert Comment

by:jools
ID: 34113321
I'm not sure if us helping you do this will infringe copyright, can you post something where it says you can rip the content from this site?
0
 
LVL 11

Author Comment

by:N R
ID: 34113357
Looking, but I don't show any policies on that site anywhere not even at bottom of site.  The pictures I can go and find on google they are all over the net.
0
 
LVL 19

Expert Comment

by:jools
ID: 34113438
ok, I can see what you're saying but they will be copyrighted to someone and EE policy says we can't help if this is the case.

I'll post a request for a moderator to check this out if they say OK then we can knock something up pretty quick, if they say no then you'll have to sort something yourself.
0
 
LVL 11

Author Comment

by:N R
ID: 34113450
ok thanks
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 34113768
This is an example of a "scraper" script.  Perhaps you can extend the principles shown here to your specific application requirements.  Please get permission for whatever use you plan to make of the images.  Best regards, ~Ray
<?php // RAY_temp_gallitin.php
error_reporting(E_ALL);

// READ THE FOREIGN PAGE
$htm  = file_get_contents('http://www.caught.com/index/sexy-and-funny-gallery-xxiv/');

// SCRAPE THE PAGE DATA WITH THESE KEY ELEMENTS
$dlm1 = "<div id='gallery-1'";
$dlm2 = "<br style='clear: both;' />";
$dlm3 = '<img ';

// REMOVE FRONT AND BACK OF HTML
$arr  = explode($dlm1, $htm);
$htm  = $dlm1 . $arr[1];
$arr  = explode($dlm2, $htm);
$htm  = $arr[0];

// REMOVE WHITESPACE AND COMPRESS
$str  = preg_replace('/\s\s+/', ' ', $htm);

// CREATE AN ITERATOR
$new  = array();
$arr  = explode($dlm3, $str);
foreach ($arr as $thing)
{
    // IMAGE TAGS HAVE WIDTH IN A CONSISTENT PLACE
    if (substr($thing,0,3) != 'wid') continue;

    // REMOVE EXTRANEOUS STUFF - SAVE THE IMAGE URL
    $poz   = strpos($thing, '>');
    $thing = substr($thing, 0, $poz+1);
    $new[] = $dlm3 . $thing;
}

// SHOW THE WORK PRODUCT
print_r($new);

Open in new window

0
 
LVL 11

Author Comment

by:N R
ID: 34113872
@Ray, thanks, what's would the syntax be to execute this .php file with wget?
0
 
LVL 78

Expert Comment

by:arnold
ID: 34114254
You can use wget to retrieve the complete site which will include the image files.

wget -r http://www.domain.com
This will download the site into www.domain.com/

0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 34115573
I would not execute the PHP file with wget.  I would install that script on my server and visit it with a click - like a regular web page.  Example here:

http://www.laprbass.com/RAY_temp_gallitin.php

I'll leave that online for a little while so you can see what it does.
0
 
LVL 19

Expert Comment

by:jools
ID: 34115671
Are we all now contributing to possibly infringing the copyright of this (or other) sites?

I thought I'd raised a valid point with a concern about the copyright of the other site but it seems that points rule.

This was raised with the moderators but it seems that the question had now been answered regardless.
0
 
LVL 78

Expert Comment

by:arnold
ID: 34116080
A publicly accessible site is open to some knuckle head infringing on their copyright by reusing the image. Whether they download an image at a time, or download all of them.

IMHO, give the asker the benefit of a doubt that they want to get the images from a site to which they have access without the need to use FTP or other options. Or they want to only use the images/data that is actually used on the site versus having all the images from the begining of the sites existance and are no longer in use (referenced in any document on the site)
0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 34116196
Agree with arnold.  If you simply visit the site, you can sit there at your computer and click to save the images.  Automation does not trump intent.  It's still wise to get permission if you want to do something like harvest images in this automated manner.
0
 
LVL 11

Author Comment

by:N R
ID: 34116397
Thanks for your help Ray.  This really wasn't a malicious request, only trying to learn how it was possible to do and could use this script for many other things.
0
 
LVL 19

Expert Comment

by:jools
ID: 34116584
Gallitin, I never thought your request was malicious. I do believe though we should be careful, some sites have T's & C's which prohibit scraping their data and generally it's not really the done thing to rip a site like that.

However, it's a bit of a pointless arguement now.

That said I thought Rays' code was up to it's usual high standard.
0

Featured Post

U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How Close unsubmited attempts 10 48
Special characters in a TCPDF 4 28
Delete image(s) associated with record(s) 16 30
Make an array show the subkey and put it in a query 2 29
Have you ever been frustrated by having to click seven times in order to retrieve a small bit of information from the web, always the same seven clicks, scrolling down and down until you reach your target? When you know the benefits of the command l…
I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension (http://www.experts-exchange.com/discussions/210281/Attachments-with-no-extension.html). This reminded me of questions tha…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question