Solved

wget syntax

Posted on 2010-11-11
20
765 Views
Last Modified: 2012-05-10
I'm trying to use the wget to get all .jpg files that are in a directory.

Example:

wget http://domain.com/pictures/*.jpg

that didn't work is there a different way?
0
Comment
Question by:N R
  • 7
  • 7
  • 4
  • +1
20 Comments
 
LVL 19

Expert Comment

by:jools
Comment Utility
there is but you have to get the directory list first.

try wget http://domain.com/pictures/

it'll probably create an index.html file which you'll need to edit, you can use awk/sed to strip out the files you want then just create a for loop to wget the images files.

0
 
LVL 11

Author Comment

by:N R
Comment Utility
error 403 forbidden.
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Please post the actual URL and directory you want to access.  When you say you want to "get" the JPG files, do you mean that you want to copy them to your server?

Thanks, ~Ray
0
 
LVL 11

Author Comment

by:N R
Comment Utility
I take that back, it only returns the index.html page.  Within that index file I can find the links to the pictures so is there a way to have it parse the file and find links that end with .jpg to download?
0
 
LVL 19

Expert Comment

by:jools
Comment Utility
yes, you can strip out the jpg filenames with awk and sed and then just loop thru them running wget on each one.
0
 
LVL 11

Author Comment

by:N R
Comment Utility
@Ray Paseur
There are a lot of them that I would like to use this with, here is an example:
http://www.caught.com/index/sexy-and-funny-gallery-xxiv/

Yes, wget to my server.


@jools
I'm not familiar with awk and sed how would I do this?
0
 
LVL 19

Expert Comment

by:jools
Comment Utility
come to think of it, last time I did this I may have used a combination of elinks and wget, I think I used elinks -dump, I'll see if I can find my old script.
0
 
LVL 19

Expert Comment

by:jools
Comment Utility
I'm not sure if us helping you do this will infringe copyright, can you post something where it says you can rip the content from this site?
0
 
LVL 11

Author Comment

by:N R
Comment Utility
Looking, but I don't show any policies on that site anywhere not even at bottom of site.  The pictures I can go and find on google they are all over the net.
0
 
LVL 19

Expert Comment

by:jools
Comment Utility
ok, I can see what you're saying but they will be copyrighted to someone and EE policy says we can't help if this is the case.

I'll post a request for a moderator to check this out if they say OK then we can knock something up pretty quick, if they say no then you'll have to sort something yourself.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 11

Author Comment

by:N R
Comment Utility
ok thanks
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
This is an example of a "scraper" script.  Perhaps you can extend the principles shown here to your specific application requirements.  Please get permission for whatever use you plan to make of the images.  Best regards, ~Ray
<?php // RAY_temp_gallitin.php

error_reporting(E_ALL);



// READ THE FOREIGN PAGE

$htm  = file_get_contents('http://www.caught.com/index/sexy-and-funny-gallery-xxiv/');



// SCRAPE THE PAGE DATA WITH THESE KEY ELEMENTS

$dlm1 = "<div id='gallery-1'";

$dlm2 = "<br style='clear: both;' />";

$dlm3 = '<img ';



// REMOVE FRONT AND BACK OF HTML

$arr  = explode($dlm1, $htm);

$htm  = $dlm1 . $arr[1];

$arr  = explode($dlm2, $htm);

$htm  = $arr[0];



// REMOVE WHITESPACE AND COMPRESS

$str  = preg_replace('/\s\s+/', ' ', $htm);



// CREATE AN ITERATOR

$new  = array();

$arr  = explode($dlm3, $str);

foreach ($arr as $thing)

{

    // IMAGE TAGS HAVE WIDTH IN A CONSISTENT PLACE

    if (substr($thing,0,3) != 'wid') continue;



    // REMOVE EXTRANEOUS STUFF - SAVE THE IMAGE URL

    $poz   = strpos($thing, '>');

    $thing = substr($thing, 0, $poz+1);

    $new[] = $dlm3 . $thing;

}



// SHOW THE WORK PRODUCT

print_r($new);

Open in new window

0
 
LVL 11

Author Comment

by:N R
Comment Utility
@Ray, thanks, what's would the syntax be to execute this .php file with wget?
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
You can use wget to retrieve the complete site which will include the image files.

wget -r http://www.domain.com
This will download the site into www.domain.com/

0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
Comment Utility
I would not execute the PHP file with wget.  I would install that script on my server and visit it with a click - like a regular web page.  Example here:

http://www.laprbass.com/RAY_temp_gallitin.php

I'll leave that online for a little while so you can see what it does.
0
 
LVL 19

Expert Comment

by:jools
Comment Utility
Are we all now contributing to possibly infringing the copyright of this (or other) sites?

I thought I'd raised a valid point with a concern about the copyright of the other site but it seems that points rule.

This was raised with the moderators but it seems that the question had now been answered regardless.
0
 
LVL 76

Expert Comment

by:arnold
Comment Utility
A publicly accessible site is open to some knuckle head infringing on their copyright by reusing the image. Whether they download an image at a time, or download all of them.

IMHO, give the asker the benefit of a doubt that they want to get the images from a site to which they have access without the need to use FTP or other options. Or they want to only use the images/data that is actually used on the site versus having all the images from the begining of the sites existance and are no longer in use (referenced in any document on the site)
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Agree with arnold.  If you simply visit the site, you can sit there at your computer and click to save the images.  Automation does not trump intent.  It's still wise to get permission if you want to do something like harvest images in this automated manner.
0
 
LVL 11

Author Comment

by:N R
Comment Utility
Thanks for your help Ray.  This really wasn't a malicious request, only trying to learn how it was possible to do and could use this script for many other things.
0
 
LVL 19

Expert Comment

by:jools
Comment Utility
Gallitin, I never thought your request was malicious. I do believe though we should be careful, some sites have T's & C's which prohibit scraping their data and generally it's not really the done thing to rip a site like that.

However, it's a bit of a pointless arguement now.

That said I thought Rays' code was up to it's usual high standard.
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now