Solved

Efficiently copying lots of small files over a network

Posted on 2011-09-17
8
705 Views
Last Modified: 2012-06-21
Hi everybody,

Every once in a while I have a customer's computer that I need to quickly make a backup of. I have a linux fileserver + samba for this with a high performance raid storage array connected via gigabit ethernet for this purpose.

Normally I just use robocopy and copy all of the necessary files over the network, and this works pretty good.

The problem, though, is that if the client has hundreds of thousands of tiny files - e.g. if I am copying windows, cache folders, temp data etc - it is not efficient, it is quite slow.

Is there a more efficient file copying utility that I can use to make these copies that will:

1) FAST and easy installation on the client's machine. I don't want to go through a whole big Windows installer or complicated configuration (e.g. Cygwin installation of cwRsync). Ideally just a single .EXE that I can run would be nice.

2) Recursively handle large volumes (e.g. 500,000+) of files, lots of deep nested folders etc, - e.g. a quality, robust, efficient copying program

3) Gracefully handle errors like files being in use, and keep pushing through the copy, hopefully producing a logfile of the results afterwards

4) Efficiently copy to a linux file system and not get bogged down by lots of tiny files

I can setup whatever server side stuff is necessary on the linux machine. I have Samba setup, but I could set up SSH, NFS, something proprietary etc. whatever is necessary.
0
Comment
Question by:Frosty555
8 Comments
 
LVL 47

Expert Comment

by:dbrunton
ID: 36554885
Have a look at RichCopy http://en.wikipedia.org/wiki/RichCopy
0
 
LVL 3

Expert Comment

by:hrr1963
ID: 36554997
I would setup a FTP server. Then use for example FileZilla Client to transfer the files efficiently.

FileZilla Server and client for linux and windows: http://filezilla-project.org/
0
 
LVL 4

Expert Comment

by:iwaxx
ID: 36555956
You might consider GoodSync, that will handle the diff version of your files.
So on first start, it may be long, but next, it will only copy (synchronize) the files that have been changed, and I'm pretty sure that on 500,000 files, a whole majority of them don't change.

It works on a heaps of service type, and on the server side, you could make a script to backup the different versions if you need versionning.

www.goodsync.com
0
 
LVL 4

Accepted Solution

by:
duffme earned 334 total points
ID: 36556165
I had similar constraints when needing to replicate 50GB a night between two 2008 servers.  I wound up using a mix of Robocopy and RichCopy.  Richcopy is not completely stable and occassionally crashes or acts a bit flaky.  Multi-threading is key.  Older versions of Robocopy are single threaded.  I think only Win7 and Server 2008 R2 include the /MT option for RB, and it can't be installed on an earlier version.  If you have one of these make sure to use the /MT option.  There are some third party apps, but I wasn't allowed to use them.  

Use exclude filters.  If you are wasting bandwidth copying useless cache and such then just don't copy it.  If you are looking for a total backup then a hot imaging solution like Acronis can be great, but not cheap.

Only copy deltas.  If you are using Robocopy you can use the /MIR option and time parameters and only copy what has changed.  You could possibly do this with Archive bits too.

Can you pull the copy using rsync or cpio through SAMBA client accessing the Windows root shares (C$, etc.)?
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 3

Assisted Solution

by:hrr1963
hrr1963 earned 166 total points
ID: 36556447
I still recommend you to use a FTP solution, then tweak the client to transfer 10 files at the same time.
0
 
LVL 31

Author Comment

by:Frosty555
ID: 36568133
Hi everyone,

It looks like multithreading the copy is really the way to go (or at least finding a copy utility that takes advantage of that).

I've never heard of richcopy before. I'll check it out.

I've tried the FTP approach... and I also tried WinSCP, but I found it to actually be slower than regular file copies due to the constant back and forth handshaking needed in the FTP protocol. It was fine for large files, but again like the others, choked on lots of small files. It might work if I told FileZilla to copy 10 at a time. I haven't actually tried that yet. Will FileZilla continue to be robust and fast when I have half a million files in the queue? I'm not sure if it was ever designed to handle that kind of file volume.

I can't pull a copy of the files using the administrative share - while that works on all of *MY* computers, it doesn't necessarily work on my clients who have a wide array of firewalls, weird windows settings and broken services on their computers.

Also I wasn't aware of the /MT option in Robocopy. Some of my clients are Windows XP, but that's not going to be the case forever, I'll look into the /MT.
0
 
LVL 4

Expert Comment

by:duffme
ID: 36569221
As far as the admin shares, you can create your own root shares with specific security and using a service account, but this may not be possible on client boxes.  Remember that richcopy is not the most stable utility, though many of us still find a use for it. Some third party apps use multithreading and offer great features for not too much money, but /MT with robocopy should work great if your Win boxes are new enough.
0
 
LVL 4

Assisted Solution

by:duffme
duffme earned 334 total points
ID: 36590051
btw, I was reminded today: in RichCopy there are options for multithreading pertaining to searching, directory, and file operations.  Multithreading file operations will try to use multiple threads per single file copy, which can cause a lot fo errors.  The other options will allow multiple copies (each in its own thread) and allow multiple threads for file comparison, dir search etc.  The point is, you may need to tweak particular multithread options depending on what utility you wind up using.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Are you one of those front-line IT Service Desk staff fielding calls, replying to emails, all-the-while working to resolve end-user technological nightmares? I am! That's why I have put together this brief overview of tools and techniques I use in o…
Ever wondered why Windows 8 and 10 don't seem to accept your GPO-based software deployment while Windows 7 does? Read on.
In this Micro Tutorial viewers will learn how to restore single file or folder from Bare Metal backup image of their system. Tutorial shows how to restore files and folders from system backup. Often it is not needed to restore entire system when onl…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now