Solved

Using wget safely

Posted on 2011-02-21
7
967 Views
Last Modified: 2012-05-11
I have a directory, lets call it:

http://www.mysite.com/files

I would like to use wget to backup everything in this directory, including subdirectories.  But I don't want it to grab the files that sit in the directory before it, i.e. http://www.mysite.com/index.html

I don't want to tax the server while performing this backup so I think the wait, or rate-limit, option should be used.

I would tell you everything I've tried so far but that would just confuse the issue.
0
Comment
Question by:hrolsons
  • 3
  • 2
  • 2
7 Comments
 
LVL 40

Expert Comment

by:omarfarid
Comment Utility
if you use wget with the recursive option then you can take all files and subdirs as well. If you specify the url to be http://www.mysite.com/files then it will not download http://www.mysite.com/index.html

looking at the wget man page http://linux.die.net/man/1/wget  , below options are useful

--limit-rate=amount
    Limit the download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the k suffix, or megabytes with the m suffix. For example, --limit-rate=20k will limit the retrieval rate to 20KB/s. This is useful when, for whatever reason, you don't want Wget to consume the entire available bandwidth.

    This option allows the use of decimal numbers, usually in conjunction with power suffixes; for example, --limit-rate=2.5k is a legal value.

    Note that Wget implements the limiting by sleeping the appropriate amount of time after a network read that took less time than specified by the rate. Eventually this strategy causes the TCP transfer to slow down to approximately the specified rate. However, it may take some time for this balance to be achieved, so don't be surprised if limiting the rate doesn't work well with very small files.
-w seconds
--wait=seconds
    Wait the specified number of seconds between the retrievals. Use of this option is recommended, as it lightens the server load by making the requests less frequent. Instead of in seconds, the time can be specified in minutes using the "m" suffix, in hours using "h" suffix, or in days using "d" suffix.
0
 

Author Comment

by:hrolsons
Comment Utility
cool, that is looking good.  How would it treat files that it already got on a previous backup?
0
 

Author Comment

by:hrolsons
Comment Utility
Darn it, it's still getting too many files, let me change my original example to I want:

http://www.mysite.com/files/set1

and under "files", I have set1, set2, set3 ...

It's not just grabbing the set1 files, it also grabs set2, set3 etc.

The command I issue is:

wget --limit-rate=20K -r http://www.mysite.com/files/set1


0
Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

 
LVL 12

Expert Comment

by:mccracky
Comment Utility
You need to look into the -np option too (no parent directories so you don't go up the tree, only down).

Do you want the links to be rewritten to work locally or not?  (The -k option)

Do you want to keep the domain and all the directories?  (The -nH and --cut-dirs options)

Personally, I'd probably use something like:

wget -c -k -r -N -l inf -w 5 --limit-rate=<rate you want> http://www.mysite.com/files
0
 
LVL 12

Accepted Solution

by:
mccracky earned 500 total points
Comment Utility
oops, I forgot to add in the -np option above (and since it's a backup, I might not convert the links):

wget -c -np -r -N -l inf -w 5 --limit-rate=<rate you want> http://www.mysite.com/files
0
 
LVL 40

Expert Comment

by:omarfarid
Comment Utility
see if below will help

-m
--mirror
    Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.
0
 
LVL 40

Expert Comment

by:omarfarid
Comment Utility
is rsync an option for you?

http://linux.die.net/man/1/rsync
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

After running Ubuntu some time, you will be asked to download updates for fixing bugs and security updates. All the packages you download replace the previous ones, except for the kernel, also called "linux-image". This is due to the fact that w…
SSH (Secure Shell) - Tips and Tricks As you all know SSH(Secure Shell) is a network protocol, which we use to access/transfer files securely between two networked devices. SSH was actually designed as a replacement for insecure protocols that sen…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now