Solved

Using wget safely

Posted on 2011-02-21
7
973 Views
Last Modified: 2012-05-11
I have a directory, lets call it:

http://www.mysite.com/files

I would like to use wget to backup everything in this directory, including subdirectories.  But I don't want it to grab the files that sit in the directory before it, i.e. http://www.mysite.com/index.html

I don't want to tax the server while performing this backup so I think the wait, or rate-limit, option should be used.

I would tell you everything I've tried so far but that would just confuse the issue.
0
Comment
Question by:hrolsons
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
  • 2
7 Comments
 
LVL 40

Expert Comment

by:omarfarid
ID: 34948761
if you use wget with the recursive option then you can take all files and subdirs as well. If you specify the url to be http://www.mysite.com/files then it will not download http://www.mysite.com/index.html

looking at the wget man page http://linux.die.net/man/1/wget  , below options are useful

--limit-rate=amount
    Limit the download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the k suffix, or megabytes with the m suffix. For example, --limit-rate=20k will limit the retrieval rate to 20KB/s. This is useful when, for whatever reason, you don't want Wget to consume the entire available bandwidth.

    This option allows the use of decimal numbers, usually in conjunction with power suffixes; for example, --limit-rate=2.5k is a legal value.

    Note that Wget implements the limiting by sleeping the appropriate amount of time after a network read that took less time than specified by the rate. Eventually this strategy causes the TCP transfer to slow down to approximately the specified rate. However, it may take some time for this balance to be achieved, so don't be surprised if limiting the rate doesn't work well with very small files.
-w seconds
--wait=seconds
    Wait the specified number of seconds between the retrievals. Use of this option is recommended, as it lightens the server load by making the requests less frequent. Instead of in seconds, the time can be specified in minutes using the "m" suffix, in hours using "h" suffix, or in days using "d" suffix.
0
 

Author Comment

by:hrolsons
ID: 34948798
cool, that is looking good.  How would it treat files that it already got on a previous backup?
0
 

Author Comment

by:hrolsons
ID: 34948871
Darn it, it's still getting too many files, let me change my original example to I want:

http://www.mysite.com/files/set1

and under "files", I have set1, set2, set3 ...

It's not just grabbing the set1 files, it also grabs set2, set3 etc.

The command I issue is:

wget --limit-rate=20K -r http://www.mysite.com/files/set1


0
U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

 
LVL 12

Expert Comment

by:mccracky
ID: 34948885
You need to look into the -np option too (no parent directories so you don't go up the tree, only down).

Do you want the links to be rewritten to work locally or not?  (The -k option)

Do you want to keep the domain and all the directories?  (The -nH and --cut-dirs options)

Personally, I'd probably use something like:

wget -c -k -r -N -l inf -w 5 --limit-rate=<rate you want> http://www.mysite.com/files
0
 
LVL 12

Accepted Solution

by:
mccracky earned 500 total points
ID: 34948892
oops, I forgot to add in the -np option above (and since it's a backup, I might not convert the links):

wget -c -np -r -N -l inf -w 5 --limit-rate=<rate you want> http://www.mysite.com/files
0
 
LVL 40

Expert Comment

by:omarfarid
ID: 34948923
see if below will help

-m
--mirror
    Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.
0
 
LVL 40

Expert Comment

by:omarfarid
ID: 34948940
is rsync an option for you?

http://linux.die.net/man/1/rsync
0

Featured Post

Back Up Your Microsoft Windows Server®

Back up all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The purpose of this article is to show how we can create Linux Mint virtual machine using Oracle Virtual Box. To install Linux Mint we have to download the ISO file from its website i.e. http://www.linuxmint.com. Once you open the link you will see …
The purpose of this article is to fix the unknown display problem in Linux Mint operating system. After installing the OS if you see Display monitor is not recognized then we can install "MESA" utilities to fix this problem or we can install additio…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question