Solved

Using wget safely

Posted on 2011-02-21
7
968 Views
Last Modified: 2012-05-11
I have a directory, lets call it:

http://www.mysite.com/files

I would like to use wget to backup everything in this directory, including subdirectories.  But I don't want it to grab the files that sit in the directory before it, i.e. http://www.mysite.com/index.html

I don't want to tax the server while performing this backup so I think the wait, or rate-limit, option should be used.

I would tell you everything I've tried so far but that would just confuse the issue.
0
Comment
Question by:hrolsons
  • 3
  • 2
  • 2
7 Comments
 
LVL 40

Expert Comment

by:omarfarid
ID: 34948761
if you use wget with the recursive option then you can take all files and subdirs as well. If you specify the url to be http://www.mysite.com/files then it will not download http://www.mysite.com/index.html

looking at the wget man page http://linux.die.net/man/1/wget  , below options are useful

--limit-rate=amount
    Limit the download speed to amount bytes per second. Amount may be expressed in bytes, kilobytes with the k suffix, or megabytes with the m suffix. For example, --limit-rate=20k will limit the retrieval rate to 20KB/s. This is useful when, for whatever reason, you don't want Wget to consume the entire available bandwidth.

    This option allows the use of decimal numbers, usually in conjunction with power suffixes; for example, --limit-rate=2.5k is a legal value.

    Note that Wget implements the limiting by sleeping the appropriate amount of time after a network read that took less time than specified by the rate. Eventually this strategy causes the TCP transfer to slow down to approximately the specified rate. However, it may take some time for this balance to be achieved, so don't be surprised if limiting the rate doesn't work well with very small files.
-w seconds
--wait=seconds
    Wait the specified number of seconds between the retrievals. Use of this option is recommended, as it lightens the server load by making the requests less frequent. Instead of in seconds, the time can be specified in minutes using the "m" suffix, in hours using "h" suffix, or in days using "d" suffix.
0
 

Author Comment

by:hrolsons
ID: 34948798
cool, that is looking good.  How would it treat files that it already got on a previous backup?
0
 

Author Comment

by:hrolsons
ID: 34948871
Darn it, it's still getting too many files, let me change my original example to I want:

http://www.mysite.com/files/set1

and under "files", I have set1, set2, set3 ...

It's not just grabbing the set1 files, it also grabs set2, set3 etc.

The command I issue is:

wget --limit-rate=20K -r http://www.mysite.com/files/set1


0
Efficient way to get backups off site to Azure

This user guide provides instructions on how to deploy and configure both a StoneFly Scale Out NAS Enterprise Cloud Drive virtual machine and Veeam Cloud Connect in the Microsoft Azure Cloud.

 
LVL 12

Expert Comment

by:mccracky
ID: 34948885
You need to look into the -np option too (no parent directories so you don't go up the tree, only down).

Do you want the links to be rewritten to work locally or not?  (The -k option)

Do you want to keep the domain and all the directories?  (The -nH and --cut-dirs options)

Personally, I'd probably use something like:

wget -c -k -r -N -l inf -w 5 --limit-rate=<rate you want> http://www.mysite.com/files
0
 
LVL 12

Accepted Solution

by:
mccracky earned 500 total points
ID: 34948892
oops, I forgot to add in the -np option above (and since it's a backup, I might not convert the links):

wget -c -np -r -N -l inf -w 5 --limit-rate=<rate you want> http://www.mysite.com/files
0
 
LVL 40

Expert Comment

by:omarfarid
ID: 34948923
see if below will help

-m
--mirror
    Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.
0
 
LVL 40

Expert Comment

by:omarfarid
ID: 34948940
is rsync an option for you?

http://linux.die.net/man/1/rsync
0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

You ever wonder how to backup Linux system files just like Windows System Restore?  Well you can use Timeshift in Linux to perform those similar action.  This tutorial will show you how to backup your system files and keep regular intervals. Note…
Join Greg Farro and Ethan Banks from Packet Pushers (http://packetpushers.net/podcast/podcasts/pq-show-93-smart-network-monitoring-paessler-sponsored/) and Greg Ross from Paessler (https://www.paessler.com/prtg) for a discussion about smart network …
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

813 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now