Solved

Using WGET

Posted on 2011-02-15
10
1,019 Views
Last Modified: 2012-05-11
I want to be able to retrieve all *.csv files from a folder on a website like this
http://www.somedomain.com/my_folder/  and copy all files into a folder named somedomain.com on a Windows machine

THe names of the CSV files vary all the time

I want to do this if possible with Wget, I have not been able to figure it out myself
0
Comment
Question by:weegiraffe
  • 4
  • 2
  • 2
  • +2
10 Comments
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 250 total points
ID: 34902180
wget -r -P /path/to/somedomain.com -nd -A.csv http://www.somedomain.com/my_folder/

wmp
0
 
LVL 6

Assisted Solution

by:Bxoz
Bxoz earned 250 total points
ID: 34902252

Wget -r -nd -A.csv <url>

-r makes it recursive
-nd no directories
-A.csv means all csvfiles on page
0
 
LVL 6

Expert Comment

by:t-max
ID: 34902284
Since you can't do 'wget http://www.somedomain.com/my_folder/*.csv', the approach I think about would be like this:
* wget an html file (eg: index.html), which contains the csv names in that folder (if you allow directory listing, this shouldn't be hard to do)
* grep that file to get all the lines with .csv names, and parse those lines to leave the full file name
* loop through those names with wget to download them
* copy all the files to the folder somedomain.com, which was previously mounted using a samba client
Best regards
0
 

Author Comment

by:weegiraffe
ID: 34904420
Thank you all for your suggestions

wget -r -P /path/to/somedomain.com -nd -A.csv http://www.somedomain.com/my_folder/

is not working.  

Neither is Wget -r -nd -A.csv <url>

t-max's suggestion is far too complicated, the job could be done more efficently with FTP

I will attempt to clarify my requirements:
I need to pull down only CSV files from a specific directory on a website to a Windows File Server.

The website directory name containing the files is always the same, so there is no need to search other directories on the website.

The files are always CSVs but the name varies.

These steps have to be repeated on 50 sites, and this number will grow.

effectively I need to download all CSV from URL such as  :

http://www.somedomain.com/my_folder/
http://www.somedomain1.com/my_folder/
http://www.somedomain2.com/my_folder/
etc
to Windows folder such as somedomain.com or somedomain1.com  or somedomain2.com etc...

It would ideal if Wget could also create these target folder if they do not exist.

I know I can do all of this, with the Windows command line (or any) FTP client but, but a bunch of WGET calls in a batch file would make it easier.

If WGET is not the correct tool , then just please say so.




0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 34904428
What do you mean with "is not working"? Error messages? Undesired results? Please clarify!

wmp
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 6

Expert Comment

by:Bxoz
ID: 34905202
I think you can use the GnuWin32 tools
There is wget tool for windows

http://gnuwin32.sourceforge.net/packages/wget.htm
0
 
LVL 12

Expert Comment

by:mccracky
ID: 34907077
The other obvious option is to use rsync.  
0
 

Author Comment

by:weegiraffe
ID: 34908308
This seems to be the closest I can get

wget  -r -l1 --no-parent -A.csv  http://somedomain.com/my_folder/



0
 

Author Closing Comment

by:weegiraffe
ID: 34908331
I have found the solution
0
 

Author Comment

by:weegiraffe
ID: 34908416
************** CORRECT SOLUTION IS HERE **************

I am not sure what I did wrong but this is actually the correct solution

This seems to be the closest I can get

wget  -r -l1 --no-parent -A.csv  http://somedomain.com/my_folder/

Not the accepted solution....I appear to have made mistake, but I am happy to assign the points as shown as the suggestions set me on the correct path.

******** CORRECT SOLUTION IS HERE **************
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Suggested Solutions

It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
Know what services you can and cannot, should and should not combine on your server.
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

25 Experts available now in Live!

Get 1:1 Help Now