Copying millions of files in one directory

Currently, we have a lot of raw data flat files that are generated with every request at our business.  These are dumped all into one folder and there are millions of them.  Whenever we try to backup these files, it takes forever!!  Most are under 1kb and the hard drive has a 4K block size.  On top of that, the entire hard drive is extremely fragmented..  The problem lies in the time it takes the hard drive to do millions of seeks (which take about 5ms on a scsi hd) and then it ends up taking days to copy! Is there anyway that you could speed this up and maybe do a rawsector copy generated from the locations of the files in the MFT and not start and stop on file transfers and seek so often??  Changing the organization of these files is also not really an option-so we'll ave to figure something out.  Thanks!
Who is Participating?
cyrnelConnect With a Mentor Commented:
Date components can be represented by numbers. We can loop through numbers. No need to fight with date math for this kind of task. We just loop a week or month at a time.

Are the files somewhat evenly distributed by date? At this point tt appears simpler to take chunks of so many days at a time than a fixed number of files. This would simplify the loops, and likely later organization. Or is there another reason you'd prefer a fixed number of files?
cdesimoneAuthor Commented:
Also, defragmentation is not really an option either...
If rearranging them in more folders is not an option, there is nothing much you can do, or you should add those files on to the previous one, creating so only one file.
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

boot in to dos / command line and copy them trough that
If  you have millions of them no matter what you  do it will take some time.
But why not try the xcopy command.
from the dos prompt.
xcopy /E /V /I d:\temp\scrm d:\test\xcopy\files
Here you see the variables /E /V /I
/E           Copies directories and subdirectories, including empty ones.
/V           Verifies each new file.
/I           If destination does not exist and copying more than one file,
               assumes that destination must be a directory.
here you see the source
here is the destination

As you can tell the destination directories will be created and the files from d:\temp\scrm will be coppied to d:\test\xcopy\files
You can use any destination you choose all you have to do is make sure that you actually have that drive or partition present then the directories will be created.
If you are overwriting files already present in the destination directory you may be prompted to confirm overwrite.
The /Y variable will supress any overwriting prompting.
Still with millions of files its going to take sometime.
It may be a faster avenue to write the files to a cd-r or cd-rw disk then use the the disk to copy over the files. Though im not really sure that would be a faster avenue. Xcopy works fairly well all by itself and if the files are like just a few k each it may not take to long. Ive never tried this with millions of files but i have tried it with tens of thousands and it works really well.
How are the files used? Daily access or just safekeeping? Would it make sense to archive them weekly/monthly/etc to reduce the number of them on your filesystem at any one time? Either xcopy to another volume or zip to an archive file. You could schedule a nightly process to grab anything older than your chosen limit. If that sounds feasible we can talk about details and throw together a batch and the appropriate scheduling method for you.

cdesimoneAuthor Commented:
cyrnel, this is the idea that we have thought of if nothing works.   It would be the best idea that we have come up with so far.  Here are some details..  

    The files are autonumbered through a program and stamped with the date created.  This date will never change, as once a file is created, it is never overwritten.  We could first archive all files into groups of 10,000 upto the current date and throw that information onto a fileserver.  Then, we could do an Xcopy  to grab a date created after a cerntain period with the date switch /D:m-d-y and archive that into a current zip files.  The problems with this is...

First, how could we generate the syntax for the date script dynamically
Next how would we generate consecutive dates that would cover all files and verify that they are all there and the archives are complete.

This would be a D2D2T when we copy this in backup exec.  We could run a prescript with the bat file for this with the general layout like this...

First 10,000.......
XCOPY C:\path Z:\archive /D:<GET last archive date>
zip and archive files (delete after copy) from the intermediary folder once copied (name of file will be like
verify archive

Repeat with next 10,000 until finished..

This will be tough because of the organization of the files into groups of 10,000.  This means that we would have to open an uncomplete archive and add the files from the day and reclose it and backup.  
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.