Solved

Efficiently copying lots of small files over a network

Posted on 2011-09-17
8
710 Views
Last Modified: 2012-06-21
Hi everybody,

Every once in a while I have a customer's computer that I need to quickly make a backup of. I have a linux fileserver + samba for this with a high performance raid storage array connected via gigabit ethernet for this purpose.

Normally I just use robocopy and copy all of the necessary files over the network, and this works pretty good.

The problem, though, is that if the client has hundreds of thousands of tiny files - e.g. if I am copying windows, cache folders, temp data etc - it is not efficient, it is quite slow.

Is there a more efficient file copying utility that I can use to make these copies that will:

1) FAST and easy installation on the client's machine. I don't want to go through a whole big Windows installer or complicated configuration (e.g. Cygwin installation of cwRsync). Ideally just a single .EXE that I can run would be nice.

2) Recursively handle large volumes (e.g. 500,000+) of files, lots of deep nested folders etc, - e.g. a quality, robust, efficient copying program

3) Gracefully handle errors like files being in use, and keep pushing through the copy, hopefully producing a logfile of the results afterwards

4) Efficiently copy to a linux file system and not get bogged down by lots of tiny files

I can setup whatever server side stuff is necessary on the linux machine. I have Samba setup, but I could set up SSH, NFS, something proprietary etc. whatever is necessary.
0
Comment
Question by:Frosty555
8 Comments
 
LVL 48

Expert Comment

by:dbrunton
ID: 36554885
Have a look at RichCopy http://en.wikipedia.org/wiki/RichCopy
0
 
LVL 3

Expert Comment

by:hrr1963
ID: 36554997
I would setup a FTP server. Then use for example FileZilla Client to transfer the files efficiently.

FileZilla Server and client for linux and windows: http://filezilla-project.org/
0
 
LVL 4

Expert Comment

by:iwaxx
ID: 36555956
You might consider GoodSync, that will handle the diff version of your files.
So on first start, it may be long, but next, it will only copy (synchronize) the files that have been changed, and I'm pretty sure that on 500,000 files, a whole majority of them don't change.

It works on a heaps of service type, and on the server side, you could make a script to backup the different versions if you need versionning.

www.goodsync.com
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 4

Accepted Solution

by:
duffme earned 334 total points
ID: 36556165
I had similar constraints when needing to replicate 50GB a night between two 2008 servers.  I wound up using a mix of Robocopy and RichCopy.  Richcopy is not completely stable and occassionally crashes or acts a bit flaky.  Multi-threading is key.  Older versions of Robocopy are single threaded.  I think only Win7 and Server 2008 R2 include the /MT option for RB, and it can't be installed on an earlier version.  If you have one of these make sure to use the /MT option.  There are some third party apps, but I wasn't allowed to use them.  

Use exclude filters.  If you are wasting bandwidth copying useless cache and such then just don't copy it.  If you are looking for a total backup then a hot imaging solution like Acronis can be great, but not cheap.

Only copy deltas.  If you are using Robocopy you can use the /MIR option and time parameters and only copy what has changed.  You could possibly do this with Archive bits too.

Can you pull the copy using rsync or cpio through SAMBA client accessing the Windows root shares (C$, etc.)?
0
 
LVL 3

Assisted Solution

by:hrr1963
hrr1963 earned 166 total points
ID: 36556447
I still recommend you to use a FTP solution, then tweak the client to transfer 10 files at the same time.
0
 
LVL 31

Author Comment

by:Frosty555
ID: 36568133
Hi everyone,

It looks like multithreading the copy is really the way to go (or at least finding a copy utility that takes advantage of that).

I've never heard of richcopy before. I'll check it out.

I've tried the FTP approach... and I also tried WinSCP, but I found it to actually be slower than regular file copies due to the constant back and forth handshaking needed in the FTP protocol. It was fine for large files, but again like the others, choked on lots of small files. It might work if I told FileZilla to copy 10 at a time. I haven't actually tried that yet. Will FileZilla continue to be robust and fast when I have half a million files in the queue? I'm not sure if it was ever designed to handle that kind of file volume.

I can't pull a copy of the files using the administrative share - while that works on all of *MY* computers, it doesn't necessarily work on my clients who have a wide array of firewalls, weird windows settings and broken services on their computers.

Also I wasn't aware of the /MT option in Robocopy. Some of my clients are Windows XP, but that's not going to be the case forever, I'll look into the /MT.
0
 
LVL 4

Expert Comment

by:duffme
ID: 36569221
As far as the admin shares, you can create your own root shares with specific security and using a service account, but this may not be possible on client boxes.  Remember that richcopy is not the most stable utility, though many of us still find a use for it. Some third party apps use multithreading and offer great features for not too much money, but /MT with robocopy should work great if your Win boxes are new enough.
0
 
LVL 4

Assisted Solution

by:duffme
duffme earned 334 total points
ID: 36590051
btw, I was reminded today: in RichCopy there are options for multithreading pertaining to searching, directory, and file operations.  Multithreading file operations will try to use multiple threads per single file copy, which can cause a lot fo errors.  The other options will allow multiple copies (each in its own thread) and allow multiple threads for file comparison, dir search etc.  The point is, you may need to tweak particular multithread options depending on what utility you wind up using.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Facing problems with you memory card? Cannot access your memory card? All stored data, images, videos are lost? If these are your questions...than this small article might help you out in retrieving your lost or inaccessible data.
Concerto Cloud Services, a provider of fully managed private, public and hybrid cloud solutions, announced today it was named to the 20 Coolest Cloud Infrastructure Vendors Of The 2017 Cloud  (http://www.concertocloud.com/about/in-the-news/2017/02/0…
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…

791 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question