[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 202
  • Last Modified:

FTP Script help

Hi I need a script that can be run in cron to connect to a FTP server and download the most recently modified .gz file starting with the filename starting access. in a directory to a folder naming it file.gz

thanks for you help
0
loppyrabit
Asked:
loppyrabit
  • 3
1 Solution
 
pjedmondCommented:
This will depend on the ftp server concerned, and how it presents the index of the files, but something along the lines of:

----------8X------------
#!/bin/bash

wget ftp://myserver.com/path/ -O index.html     #There may be a .listing (or similar index file)
FILENAME=`cat index.html | grep "^access.*\.gz$" | sed q`
rm -Rf $FILENAME                                            # Delete file if it already exists
wget ftp://myserver.com/path/$FILENAME .        # Get the new one
----------8X------------

The difficult bit is deciding what selection criteria you should use for deciding which file to copy.

Of course, you could just copy the whole folder, using the -N timestamping option to ensure that only newer files are downloaded. (or wget the .listing file if it exists and use that to decide which files to download>.

A good look at man wget is suggested

http://www.eng.cam.ac.uk/help/tpl/unix/sed.html

is recommended reading, along with:

http://www.softpanorama.org/Tools/Awk/awk_one_liners.shtml

in case the 'analysis' of the index requires a fair amount of work to get the correct file for downloading.

(   (()
(`-' _\
 ''  ''

is recommended to give some ideas as

0
 
loppyrabitAuthor Commented:
Hi Thanks for your help

this is the script based on yours i am using

wget ftp://sight.com/logs/ -O index.html
FILENAME=`cat index.html | grep "access.*.gz" | sed q`
echo "$FILENAME"
#rm -Rf $FILENAME
wget $FILENAME

whne it echos out finame it gives me the complete html line ie:
2006 May 08 01:45  File        <a href="ftp://sight.com:21/logs/access.log.18.gz">access.log.18.gz</a>  (2,675,536 bytes)

and then it attempts to wget 2006 not access.log.18.gz

but access.log.18.gz is the oldist file in the dir not the newist

Please help
0
 
pjedmondCommented:
I do not know what ftpserver you have, or indeed how the data is presented in the .listing. Hence me providing links to sed and awk for you to finish it off.

Anyway, to extract the filename out of that line change to:

FILENAME=`cat index.html | grep "access.*.gz" | sed q | cut -d '"' -f2`

You need to work out how the dates are presented in the list, so that (if necessary), you can use sort or, after uploading the tar.gz file rename it, or whatever approach you decide on.

Oops - didn't add the 'sed' one liners:

http://www.eng.cam.ac.uk/help/tpl/unix/sed.html

(   (()
(`-' _\
 ''  ''



0
 
pjedmondCommented:
Top Tip - I recommend that you use cut and paste on that line - After the cut it is a single quote each side of a double quote. with " being used as the seperator.

man cut for more info.

(   (()
(`-' _\
 ''  ''
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now