Copying millions of files in one directory

Posted on 2004-09-01
Last Modified: 2010-05-18
Currently, we have a lot of raw data flat files that are generated with every request at our business.  These are dumped all into one folder and there are millions of them.  Whenever we try to backup these files, it takes forever!!  Most are under 1kb and the hard drive has a 4K block size.  On top of that, the entire hard drive is extremely fragmented..  The problem lies in the time it takes the hard drive to do millions of seeks (which take about 5ms on a scsi hd) and then it ends up taking days to copy! Is there anyway that you could speed this up and maybe do a rawsector copy generated from the locations of the files in the MFT and not start and stop on file transfers and seek so often??  Changing the organization of these files is also not really an option-so we'll ave to figure something out.  Thanks!
Question by:cdesimone

Author Comment

Comment Utility
Also, defragmentation is not really an option either...
LVL 91

Expert Comment

Comment Utility
If rearranging them in more folders is not an option, there is nothing much you can do, or you should add those files on to the previous one, creating so only one file.

Expert Comment

Comment Utility
boot in to dos / command line and copy them trough that
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.


Expert Comment

Comment Utility
If  you have millions of them no matter what you  do it will take some time.
But why not try the xcopy command.
from the dos prompt.
xcopy /E /V /I d:\temp\scrm d:\test\xcopy\files
Here you see the variables /E /V /I
/E           Copies directories and subdirectories, including empty ones.
/V           Verifies each new file.
/I           If destination does not exist and copying more than one file,
               assumes that destination must be a directory.
here you see the source
here is the destination

As you can tell the destination directories will be created and the files from d:\temp\scrm will be coppied to d:\test\xcopy\files
You can use any destination you choose all you have to do is make sure that you actually have that drive or partition present then the directories will be created.
If you are overwriting files already present in the destination directory you may be prompted to confirm overwrite.
The /Y variable will supress any overwriting prompting.
Still with millions of files its going to take sometime.
It may be a faster avenue to write the files to a cd-r or cd-rw disk then use the the disk to copy over the files. Though im not really sure that would be a faster avenue. Xcopy works fairly well all by itself and if the files are like just a few k each it may not take to long. Ive never tried this with millions of files but i have tried it with tens of thousands and it works really well.

Expert Comment

Comment Utility
How are the files used? Daily access or just safekeeping? Would it make sense to archive them weekly/monthly/etc to reduce the number of them on your filesystem at any one time? Either xcopy to another volume or zip to an archive file. You could schedule a nightly process to grab anything older than your chosen limit. If that sounds feasible we can talk about details and throw together a batch and the appropriate scheduling method for you.


Author Comment

Comment Utility
cyrnel, this is the idea that we have thought of if nothing works.   It would be the best idea that we have come up with so far.  Here are some details..  

    The files are autonumbered through a program and stamped with the date created.  This date will never change, as once a file is created, it is never overwritten.  We could first archive all files into groups of 10,000 upto the current date and throw that information onto a fileserver.  Then, we could do an Xcopy  to grab a date created after a cerntain period with the date switch /D:m-d-y and archive that into a current zip files.  The problems with this is...

First, how could we generate the syntax for the date script dynamically
Next how would we generate consecutive dates that would cover all files and verify that they are all there and the archives are complete.

This would be a D2D2T when we copy this in backup exec.  We could run a prescript with the bat file for this with the general layout like this...

First 10,000.......
XCOPY C:\path Z:\archive /D:<GET last archive date>
zip and archive files (delete after copy) from the intermediary folder once copied (name of file will be like
verify archive

Repeat with next 10,000 until finished..

This will be tough because of the organization of the files into groups of 10,000.  This means that we would have to open an uncomplete archive and add the files from the day and reclose it and backup.  

Accepted Solution

cyrnel earned 500 total points
Comment Utility
Date components can be represented by numbers. We can loop through numbers. No need to fight with date math for this kind of task. We just loop a week or month at a time.

Are the files somewhat evenly distributed by date? At this point tt appears simpler to take chunks of so many days at a time than a fixed number of files. This would simplify the loops, and likely later organization. Or is there another reason you'd prefer a fixed number of files?

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Having issues meeting security compliance criteria because of those pesky USB drives? Then I can help you! This article will explain how to disable USB Mass Storage devices in Windows Server 2008 R2.
This article is an update and follow-up of my previous article:   Storage 101: common concepts in the IT enterprise storage This time, I expand on more frequently used storage concepts.
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now