Solved

Need a reliable method to copy large amounts of files

Posted on 2013-11-12
8
629 Views
Last Modified: 2013-11-12
Hello Experts!

I have a problem in that I have to transfer hundreds of gigabytes of files from a linux server to an external usb drive and the fact that the transfer will literally take weeks to accomplish.

I have the fear that once the transfer begins, something will happen that will cause the copying to stop, and I will be stuck with part of the files transferred and the rest not.

Is there a method I can use whereby I can start the transfer, and if anything happens in mid-stream to stop it, I can restart where I left off?

Thanks!
0
Comment
Question by:OmniUnlimited
  • 4
  • 4
8 Comments
 
LVL 68

Accepted Solution

by:
woolmilkporc earned 500 total points
ID: 39643249
I think "rsync" is the tool you should use.

"rsync" will transfer only those files which either do not yet exist on the target or those which have changed in size or in last-modified time.

You use it just like "rcp", specifying source and target, one of which may be remote.

There is a lot of options, best see  the "rsync" man page for details:

http://rsync.samba.org/ftp/rsync/rsync.html

A quite common way of invoking rsync is

- for local copy:

rsync -avz /source/data /target/directory

- for remote copy:

rsync -avz /local/data remotehost:/target/directory

-a implies recursion, copy symlinks as links, preserve permissions, time stamps, group and owner

-v means "verbose", and "-z" forces compression while transferring

Please note that rsync will create an additional directory level on the target if invoked the way I posted above - the copied files and directories will go to /target/directory/data.
To change this behaviour so that  creating an additional directory level at the destination is avoided add a trailing slash to the source specification, e.g.

rsync -avz /local/data/ remotehost:/target/directory

The copied files and directories will now go directly to target/directory.
0
 
LVL 17

Author Comment

by:OmniUnlimited
ID: 39643275
Thanks woolmilkporc (love the moniker by the way) for your response.  Do I have to run it verbose (I really don't need to see output, I was going to do some checking from time to time by just seeing how many files had been transferred and if the process is still going or not.  I plan on running this job in the background) and do I need compression?  (I'm transferring to a disk that is exclusively for these files.)

So are  you saying that if, for whatever reason, the copying (or in this case resyncing) stops, I can restart using the same commands, and it will restart from the point it left off?  Or does have to go through and check everything again?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39643294
You don't have to run it in verbose mode, nor do you have to use compression (just omit "v" and "z").

You restart the transfer by specifying exactly the same commands. "rsync" will not really resume where it has left off, but the check it performs is really fast, so you won't loose too much time.

If you're going to transfer really big files consider using the "--partial" option. This way partially transferred files wil be kept on the target, and rsync will transfer just the remainder of the partial file at its next invocation.
0
 
LVL 17

Author Comment

by:OmniUnlimited
ID: 39643304
Ok, another question: the reason I asked this question is because I already had a failure on one other disk I have.  On that disk I had used the cp command (with obviously no way to start from where it ended.)

Could I use rsync on it and catch up to where it is quickly and finish up its transfer?
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39643307
Yes, of course.

"rsync" performs its checks regardless of whether the previous copy has been done by "rsync" itself or by any other tool, that's to say regardless of how the files found their way to their current location.

Oviparous Woolmilkporc

(by full name)
0
 
LVL 17

Author Comment

by:OmniUnlimited
ID: 39643317
LOL!  I think we have a new king of the beasts in our midst.

Oviparous Woolmilkporc, you have been very helpful.  One last question:  will the --partial option slow down things a bit?
0
 
LVL 68

Expert Comment

by:woolmilkporc
ID: 39643341
Not really.

During the initial transfer this option just makes rsync keep a partiallly transferred file on the target instead of deleting the stub.

On subsequent transfers, when rsync finds a file on the target to be a fragment it adds the missing data to it. A little effort is needed to find the right point to resume, but this should be minute.
0
 
LVL 17

Author Closing Comment

by:OmniUnlimited
ID: 39643343
Oviparous Woolmilkporc, you certainly rule in this kingdom.  Thank you so much for your expert help.  It is much appreciated!
0

Featured Post

Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

Join & Write a Comment

Secure Shell (SSH) is a network protocol for secure data communication, mainly used to administer remote Unix / Linux servers via command line. But it also allows the user to open a secure tunnel between a client and a server where he can send any k…
It is possible to boost certain documents at query time in Solr. Query time boosting can be a powerful resource for finding the most relevant and "best" content. Of course the more information you index, the more fields you will be able to use for y…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now