Copying millions of files in one directory

Posted on 2004-09-01
Medium Priority
Last Modified: 2010-05-18
Currently, we have a lot of raw data flat files that are generated with every request at our business.  These are dumped all into one folder and there are millions of them.  Whenever we try to backup these files, it takes forever!!  Most are under 1kb and the hard drive has a 4K block size.  On top of that, the entire hard drive is extremely fragmented..  The problem lies in the time it takes the hard drive to do millions of seeks (which take about 5ms on a scsi hd) and then it ends up taking days to copy! Is there anyway that you could speed this up and maybe do a rawsector copy generated from the locations of the files in the MFT and not start and stop on file transfers and seek so often??  Changing the organization of these files is also not really an option-so we'll ave to figure something out.  Thanks!
Question by:cdesimone

Author Comment

ID: 11960200
Also, defragmentation is not really an option either...
LVL 93

Expert Comment

ID: 11960904
If rearranging them in more folders is not an option, there is nothing much you can do, or you should add those files on to the previous one, creating so only one file.

Expert Comment

ID: 11961772
boot in to dos / command line and copy them trough that
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.


Expert Comment

ID: 11962687
If  you have millions of them no matter what you  do it will take some time.
But why not try the xcopy command.
from the dos prompt.
xcopy /E /V /I d:\temp\scrm d:\test\xcopy\files
Here you see the variables /E /V /I
/E           Copies directories and subdirectories, including empty ones.
/V           Verifies each new file.
/I           If destination does not exist and copying more than one file,
               assumes that destination must be a directory.
here you see the source
here is the destination

As you can tell the destination directories will be created and the files from d:\temp\scrm will be coppied to d:\test\xcopy\files
You can use any destination you choose all you have to do is make sure that you actually have that drive or partition present then the directories will be created.
If you are overwriting files already present in the destination directory you may be prompted to confirm overwrite.
The /Y variable will supress any overwriting prompting.
Still with millions of files its going to take sometime.
It may be a faster avenue to write the files to a cd-r or cd-rw disk then use the the disk to copy over the files. Though im not really sure that would be a faster avenue. Xcopy works fairly well all by itself and if the files are like just a few k each it may not take to long. Ive never tried this with millions of files but i have tried it with tens of thousands and it works really well.

Expert Comment

ID: 11966597
How are the files used? Daily access or just safekeeping? Would it make sense to archive them weekly/monthly/etc to reduce the number of them on your filesystem at any one time? Either xcopy to another volume or zip to an archive file. You could schedule a nightly process to grab anything older than your chosen limit. If that sounds feasible we can talk about details and throw together a batch and the appropriate scheduling method for you.


Author Comment

ID: 11978555
cyrnel, this is the idea that we have thought of if nothing works.   It would be the best idea that we have come up with so far.  Here are some details..  

    The files are autonumbered through a program and stamped with the date created.  This date will never change, as once a file is created, it is never overwritten.  We could first archive all files into groups of 10,000 upto the current date and throw that information onto a fileserver.  Then, we could do an Xcopy  to grab a date created after a cerntain period with the date switch /D:m-d-y and archive that into a current zip files.  The problems with this is...

First, how could we generate the syntax for the date script dynamically
Next how would we generate consecutive dates that would cover all files and verify that they are all there and the archives are complete.

This would be a D2D2T when we copy this in backup exec.  We could run a prescript with the bat file for this with the general layout like this...

First 10,000.......
XCOPY C:\path Z:\archive /D:<GET last archive date>
zip and archive files (delete after copy) from the intermediary folder once copied (name of file will be like 230xxxxx_CREATEDDATE.zip
verify archive

Repeat with next 10,000 until finished..

This will be tough because of the organization of the files into groups of 10,000.  This means that we would have to open an uncomplete archive and add the files from the day and reclose it and backup.  

Accepted Solution

cyrnel earned 1500 total points
ID: 11979084
Date components can be represented by numbers. We can loop through numbers. No need to fight with date math for this kind of task. We just loop a week or month at a time.

Are the files somewhat evenly distributed by date? At this point tt appears simpler to take chunks of so many days at a time than a fixed number of files. This would simplify the loops, and likely later organization. Or is there another reason you'd prefer a fixed number of files?

Featured Post

The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

In this article we will learn how to backup a VMware farm using Nakivo Backup & Replication. In this tutorial we will install the software on a Windows 2012 R2 Server.
What is the biggest problem in managing an exchange environment today? It is the lack of backups, disaster recovery (DR) plan, testing of the DR plan or believing that it won’t happen to us.
This video teaches viewers how to encrypt an external drive that requires a password to read and edit the drive. All tasks are done in Disk Utility. Plug in the external drive you wish to encrypt: Make sure all previous data on the drive has been …
This tutorial will walk an individual through the process of installing the necessary services and then configuring a Windows Server 2012 system as an iSCSI target. To install the necessary roles, go to Server Manager, and select Add Roles and Featu…
Suggested Courses

588 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question