Expiring Today—Celebrate National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


Efficiently copying lots of small files over a network

Posted on 2011-09-17
Medium Priority
Last Modified: 2012-06-21
Hi everybody,

Every once in a while I have a customer's computer that I need to quickly make a backup of. I have a linux fileserver + samba for this with a high performance raid storage array connected via gigabit ethernet for this purpose.

Normally I just use robocopy and copy all of the necessary files over the network, and this works pretty good.

The problem, though, is that if the client has hundreds of thousands of tiny files - e.g. if I am copying windows, cache folders, temp data etc - it is not efficient, it is quite slow.

Is there a more efficient file copying utility that I can use to make these copies that will:

1) FAST and easy installation on the client's machine. I don't want to go through a whole big Windows installer or complicated configuration (e.g. Cygwin installation of cwRsync). Ideally just a single .EXE that I can run would be nice.

2) Recursively handle large volumes (e.g. 500,000+) of files, lots of deep nested folders etc, - e.g. a quality, robust, efficient copying program

3) Gracefully handle errors like files being in use, and keep pushing through the copy, hopefully producing a logfile of the results afterwards

4) Efficiently copy to a linux file system and not get bogged down by lots of tiny files

I can setup whatever server side stuff is necessary on the linux machine. I have Samba setup, but I could set up SSH, NFS, something proprietary etc. whatever is necessary.
Question by:Frosty555
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 49

Expert Comment

ID: 36554885
Have a look at RichCopy http://en.wikipedia.org/wiki/RichCopy

Expert Comment

ID: 36554997
I would setup a FTP server. Then use for example FileZilla Client to transfer the files efficiently.

FileZilla Server and client for linux and windows: http://filezilla-project.org/

Expert Comment

ID: 36555956
You might consider GoodSync, that will handle the diff version of your files.
So on first start, it may be long, but next, it will only copy (synchronize) the files that have been changed, and I'm pretty sure that on 500,000 files, a whole majority of them don't change.

It works on a heaps of service type, and on the server side, you could make a script to backup the different versions if you need versionning.

Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.


Accepted Solution

duffme earned 1336 total points
ID: 36556165
I had similar constraints when needing to replicate 50GB a night between two 2008 servers.  I wound up using a mix of Robocopy and RichCopy.  Richcopy is not completely stable and occassionally crashes or acts a bit flaky.  Multi-threading is key.  Older versions of Robocopy are single threaded.  I think only Win7 and Server 2008 R2 include the /MT option for RB, and it can't be installed on an earlier version.  If you have one of these make sure to use the /MT option.  There are some third party apps, but I wasn't allowed to use them.  

Use exclude filters.  If you are wasting bandwidth copying useless cache and such then just don't copy it.  If you are looking for a total backup then a hot imaging solution like Acronis can be great, but not cheap.

Only copy deltas.  If you are using Robocopy you can use the /MIR option and time parameters and only copy what has changed.  You could possibly do this with Archive bits too.

Can you pull the copy using rsync or cpio through SAMBA client accessing the Windows root shares (C$, etc.)?

Assisted Solution

hrr1963 earned 664 total points
ID: 36556447
I still recommend you to use a FTP solution, then tweak the client to transfer 10 files at the same time.
LVL 31

Author Comment

ID: 36568133
Hi everyone,

It looks like multithreading the copy is really the way to go (or at least finding a copy utility that takes advantage of that).

I've never heard of richcopy before. I'll check it out.

I've tried the FTP approach... and I also tried WinSCP, but I found it to actually be slower than regular file copies due to the constant back and forth handshaking needed in the FTP protocol. It was fine for large files, but again like the others, choked on lots of small files. It might work if I told FileZilla to copy 10 at a time. I haven't actually tried that yet. Will FileZilla continue to be robust and fast when I have half a million files in the queue? I'm not sure if it was ever designed to handle that kind of file volume.

I can't pull a copy of the files using the administrative share - while that works on all of *MY* computers, it doesn't necessarily work on my clients who have a wide array of firewalls, weird windows settings and broken services on their computers.

Also I wasn't aware of the /MT option in Robocopy. Some of my clients are Windows XP, but that's not going to be the case forever, I'll look into the /MT.

Expert Comment

ID: 36569221
As far as the admin shares, you can create your own root shares with specific security and using a service account, but this may not be possible on client boxes.  Remember that richcopy is not the most stable utility, though many of us still find a use for it. Some third party apps use multithreading and offer great features for not too much money, but /MT with robocopy should work great if your Win boxes are new enough.

Assisted Solution

duffme earned 1336 total points
ID: 36590051
btw, I was reminded today: in RichCopy there are options for multithreading pertaining to searching, directory, and file operations.  Multithreading file operations will try to use multiple threads per single file copy, which can cause a lot fo errors.  The other options will allow multiple copies (each in its own thread) and allow multiple threads for file comparison, dir search etc.  The point is, you may need to tweak particular multithread options depending on what utility you wind up using.

Featured Post

Enroll in September's Course of the Month

This month’s featured course covers 16 hours of training in installation, management, and deployment of VMware vSphere virtualization environments. It's free for Premium Members, Team Accounts, and Qualified Experts!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

OnPage enhanced its integration with ConnectWise Manage to offer incident responders more control over the ticket and Incident Resolution Lifecycle.
Article by: Shawn
IT teams define success as solving problems quickly. To enable ITSM modernization we have to think of adopting the tools and methods that will enable resolution of ITSM issues more quickly.
This tutorial will walk an individual through locating and launching the BEUtility application and how to execute it on the appropriate database. Log onto the server running the Backup Exec database. In a larger environment, this would generally be …
This tutorial will walk an individual through setting the global and backup job media overwrite and protection periods in Backup Exec 2012. Log onto the Backup Exec Central Administration Server. Examine the services. If all or most of them are stop…

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question