Link to home
Start Free TrialLog in
Avatar of B R
B RFlag for Ireland

asked on

Using WGET

I want to be able to retrieve all *.csv files from a folder on a website like this
http://www.somedomain.com/my_folder/  and copy all files into a folder named somedomain.com on a Windows machine

THe names of the CSV files vary all the time

I want to do this if possible with Wget, I have not been able to figure it out myself
ASKER CERTIFIED SOLUTION
Avatar of woolmilkporc
woolmilkporc
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Since you can't do 'wget http://www.somedomain.com/my_folder/*.csv', the approach I think about would be like this:
* wget an html file (eg: index.html), which contains the csv names in that folder (if you allow directory listing, this shouldn't be hard to do)
* grep that file to get all the lines with .csv names, and parse those lines to leave the full file name
* loop through those names with wget to download them
* copy all the files to the folder somedomain.com, which was previously mounted using a samba client
Best regards
Avatar of B R

ASKER

Thank you all for your suggestions

wget -r -P /path/to/somedomain.com -nd -A.csv http://www.somedomain.com/my_folder/

is not working.  

Neither is Wget -r -nd -A.csv <url>

t-max's suggestion is far too complicated, the job could be done more efficently with FTP

I will attempt to clarify my requirements:
I need to pull down only CSV files from a specific directory on a website to a Windows File Server.

The website directory name containing the files is always the same, so there is no need to search other directories on the website.

The files are always CSVs but the name varies.

These steps have to be repeated on 50 sites, and this number will grow.

effectively I need to download all CSV from URL such as  :

http://www.somedomain.com/my_folder/ 
http://www.somedomain1.com/my_folder/ 
http://www.somedomain2.com/my_folder/
etc
to Windows folder such as somedomain.com or somedomain1.com  or somedomain2.com etc...

It would ideal if Wget could also create these target folder if they do not exist.

I know I can do all of this, with the Windows command line (or any) FTP client but, but a bunch of WGET calls in a batch file would make it easier.

If WGET is not the correct tool , then just please say so.




What do you mean with "is not working"? Error messages? Undesired results? Please clarify!

wmp
I think you can use the GnuWin32 tools
There is wget tool for windows

http://gnuwin32.sourceforge.net/packages/wget.htm
The other obvious option is to use rsync.  
Avatar of B R

ASKER

This seems to be the closest I can get

wget  -r -l1 --no-parent -A.csv  http://somedomain.com/my_folder/



Avatar of B R

ASKER

I have found the solution
Avatar of B R

ASKER

************** CORRECT SOLUTION IS HERE **************

I am not sure what I did wrong but this is actually the correct solution

This seems to be the closest I can get

wget  -r -l1 --no-parent -A.csv  http://somedomain.com/my_folder/

Not the accepted solution....I appear to have made mistake, but I am happy to assign the points as shown as the suggestions set me on the correct path.

******** CORRECT SOLUTION IS HERE **************