Copying millions of files in one directory

Posted on 2004-09-01
Medium Priority
Last Modified: 2010-05-18
Currently, we have a lot of raw data flat files that are generated with every request at our business.  These are dumped all into one folder and there are millions of them.  Whenever we try to backup these files, it takes forever!!  Most are under 1kb and the hard drive has a 4K block size.  On top of that, the entire hard drive is extremely fragmented..  The problem lies in the time it takes the hard drive to do millions of seeks (which take about 5ms on a scsi hd) and then it ends up taking days to copy! Is there anyway that you could speed this up and maybe do a rawsector copy generated from the locations of the files in the MFT and not start and stop on file transfers and seek so often??  Changing the organization of these files is also not really an option-so we'll ave to figure something out.  Thanks!
Question by:cdesimone

Author Comment

ID: 11960200
Also, defragmentation is not really an option either...
LVL 93

Expert Comment

ID: 11960904
If rearranging them in more folders is not an option, there is nothing much you can do, or you should add those files on to the previous one, creating so only one file.

Expert Comment

ID: 11961772
boot in to dos / command line and copy them trough that
NFR key for Veeam Agent for Linux

Veeam is happy to provide a free NFR license for one year.  It allows for the non‑production use and valid for five workstations and two servers. Veeam Agent for Linux is a simple backup tool for your Linux installations, both on‑premises and in the public cloud.


Expert Comment

ID: 11962687
If  you have millions of them no matter what you  do it will take some time.
But why not try the xcopy command.
from the dos prompt.
xcopy /E /V /I d:\temp\scrm d:\test\xcopy\files
Here you see the variables /E /V /I
/E           Copies directories and subdirectories, including empty ones.
/V           Verifies each new file.
/I           If destination does not exist and copying more than one file,
               assumes that destination must be a directory.
here you see the source
here is the destination

As you can tell the destination directories will be created and the files from d:\temp\scrm will be coppied to d:\test\xcopy\files
You can use any destination you choose all you have to do is make sure that you actually have that drive or partition present then the directories will be created.
If you are overwriting files already present in the destination directory you may be prompted to confirm overwrite.
The /Y variable will supress any overwriting prompting.
Still with millions of files its going to take sometime.
It may be a faster avenue to write the files to a cd-r or cd-rw disk then use the the disk to copy over the files. Though im not really sure that would be a faster avenue. Xcopy works fairly well all by itself and if the files are like just a few k each it may not take to long. Ive never tried this with millions of files but i have tried it with tens of thousands and it works really well.

Expert Comment

ID: 11966597
How are the files used? Daily access or just safekeeping? Would it make sense to archive them weekly/monthly/etc to reduce the number of them on your filesystem at any one time? Either xcopy to another volume or zip to an archive file. You could schedule a nightly process to grab anything older than your chosen limit. If that sounds feasible we can talk about details and throw together a batch and the appropriate scheduling method for you.


Author Comment

ID: 11978555
cyrnel, this is the idea that we have thought of if nothing works.   It would be the best idea that we have come up with so far.  Here are some details..  

    The files are autonumbered through a program and stamped with the date created.  This date will never change, as once a file is created, it is never overwritten.  We could first archive all files into groups of 10,000 upto the current date and throw that information onto a fileserver.  Then, we could do an Xcopy  to grab a date created after a cerntain period with the date switch /D:m-d-y and archive that into a current zip files.  The problems with this is...

First, how could we generate the syntax for the date script dynamically
Next how would we generate consecutive dates that would cover all files and verify that they are all there and the archives are complete.

This would be a D2D2T when we copy this in backup exec.  We could run a prescript with the bat file for this with the general layout like this...

First 10,000.......
XCOPY C:\path Z:\archive /D:<GET last archive date>
zip and archive files (delete after copy) from the intermediary folder once copied (name of file will be like 230xxxxx_CREATEDDATE.zip
verify archive

Repeat with next 10,000 until finished..

This will be tough because of the organization of the files into groups of 10,000.  This means that we would have to open an uncomplete archive and add the files from the day and reclose it and backup.  

Accepted Solution

cyrnel earned 1500 total points
ID: 11979084
Date components can be represented by numbers. We can loop through numbers. No need to fight with date math for this kind of task. We just loop a week or month at a time.

Are the files somewhat evenly distributed by date? At this point tt appears simpler to take chunks of so many days at a time than a fixed number of files. This would simplify the loops, and likely later organization. Or is there another reason you'd prefer a fixed number of files?

Featured Post

NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Finding original email is quite difficult due to their duplicates. From this article, you will come to know why multiple duplicates of same emails appear and how to delete duplicate emails from Outlook securely and instantly while vital emails remai…
In this article we will learn how to backup a VMware farm using Nakivo Backup & Replication. In this tutorial we will install the software on a Windows 2012 R2 Server.
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …
Suggested Courses
Course of the Month13 days, 11 hours left to enroll

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question