Backup Exec and millions of small files

We have Symantec Backup Exec 10d, an LTO tape library, a server &l disk array and a Gigabit network.  
We have two servers each with over two million small files, across fifty thousand directories on a single NTFS volume.  The files total about 250GB on each server.  
Files are read-only with approx. 10,000 new files added every weekday.

Backup of this volume achieves a reported throughput of 200MB/Min as opposed to the over 1GB/Min we achieve on Exchange or SQL database files.  
More importantly it takes over eighteen hours to perform a full backup.  

So how do we achieve a small backup windows and faster backup?
The files need to be held off-site so currently they are written to tape, and not just another disk array.  

Using BackupExec options can will improve the performance?
Do we now need to consider flash snapshot and mirroring options to reduce the windows?
Or should we use replication to copy them to a remote office?
And would either or these help get the files onto tape more quickly?  

Answers based on actual experience using Symantec / Veritas solutions are preferred as we have alot of installed product from this software house.  

Who is Participating?

Improve company productivity with a Business Account.Sign Up

cvsadminConnect With a Mentor Commented:
Here are my thoughts. for David

In most cases you will be running on a small array, i suspect that the disks are running full out to provide the maximum throughput to your tape device, you are most likely going to have to add more platters to your raid array in order to achieve additional throughput. You could try and use winrar to zip them to an archive.

The reason the exchange backup is so fast is that its reading the data in large chunks, most likely the maximum for the unit due to the single file nature of exchange.
Your small files are being read in ones and twos, for example, more disk speed/disks required in that case. Thus my sugestion to zip them or rar them to a single file and then push it to tape, most likely would give you the speed you need.

Here are my thoughts for sims.
You are already running a good raid array, i dont think that additional disks will help you, how many disk are you running? 4-6? your small file issue may be the tape drive having to write the small chunks. Sorry i cant give you more information.

Here are my thoughs for both of you.
First you guys are backing up 500gb to 2TB+ and increase your files daily, you will eventualy run into some issues regarding growth of the tape backup solution, sims you are almost there....
In the past i have built a 7TB raid array with the promise vtrack15100 and 15 400 gig drives, setup suresync and or replistore to copy the data to an alternate location, but this is somewhat tricky.
You need both arrays in the same room, best to robocopy the data to the backup array a couple times to make sure you have it all, transport your backup array to your new location, then let the software do a bitcopy check against all the files, it will queue any new files to send over. This is the only way to reduce your bandwidth usage, this works over adsl and cable modem connections.

I have a similar problem... about 2TB of 100k - 1MB TIFF files. Weekly backup takes about 48 and requires 2 drives with a maximum throughput of about 600MB/Min and an average of about 375MB/Min. This is using 2 SDLT 600 drives and BE 10d. The problem we have in common is the fact that there are so many small files in such large folder structure which is the bottleneck. It is not a hardware or software issue in this case. Using the same hardware I achieve about 3GB/Min on SQL or Exchange.

I would say if your main goal was to archive then you may have a problem. But if you are looking to potentially replicate or keep an up to date copy in an off-site then you may consider the following.

Depending on the physical location of this offsite, the size of this daily 10,000 file daily update, your availability and probably a few other factors I just can't think of at the moment...

Weekly Backups (Friday, Saturday or Sunday) to allow for enough time for the backup to complete successfully with daily differential or incremental backups that you can bring to the remote location and restore to keep and up to date copy else where. If restore time is an issue a differential may be a better idea as you only need the latest full backup and the latest differential backup to get you up to date as opposed to the full backup and all incremental jobs to get you back up again.

This will allow you to keep backup times mid week to a minimum.

I will check back in a bit to look for some comments from you as well as additional input.
cvsadmin: Thanks for the input however in my case all this data already resides on our SAN and the backups are really only needed in the event of multiple failures and or natural disasters.

To lose the data on the SAN the a combination of the following must happen:
Complete DAE failure
2 failed disks in that array with failure of DAE hot spare as well as 4 additional global hot spares.
Fire, Flood etc...

This data is also available at our DR colo and is never more then 12 hours behind in the event of TOTAL failure.

Basically anything is possible if you have the available funding.

Thanks again

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

davidt67Author Commented:
Thanks for the feedback thus far.

Question, is there any reason to think NetBackup would perform any better than Backup Exec in this scenario?

Additional Info.
RPO      close of previous business day is acceptable.
RTO      Less than 12 hours is the target. Thus the problem with the backup and corresponding restore times.

Disk Arrays are 2x RAID5+HS across seven disks. 15K RPM SCSI disks
Not obviously a bottleneck, certainly less thrashing than when we do an array rebuild or expansion.
Source server processor runs at less than 20% during backup and BEX agent only uses one processor from the four available. network, target server and tape certainly not taxed, they are mostly waiting on the source-side processing.

For one application, consolidating files into .cabs is an option for the other it's not.  

Running a compression routine prior to backup, would just seem to exasparate the problem of the limited backup window available, but certainly would improve tape streaming and restore time.  Fortunately should a restore be required it would be the whole array, not individual files.  

What are the upper limits on creating a .zip file?  Presumably this is no faster than Veritas creating the volume snapshot.  

I am thinking we should utilise Veritas technologies to snapshot mirror the volumes to a seperate array, then replicate the volumes offsite, and then stream them to tape.  
NO, NO, and NO.  NO tape backup software will handle 10,000 directories and 1,000,000 files in anything less than about a day.  You need to stop creating these zillions of tiny files.  The app that is making them is SERIOUSLY in error, it is badly conceived to make this many small files, the developers should have realized this would create a backup nightmare.  They probably should be out of a job by now....

At this point, your best bet is to ZIP the old files into an archive.  You can stuff a thousand tiny files into a ZIP archive, and it will copy in 1-2 seconds, whereas the 1,000 tiny files will take 5-10 minutes on ANY file copy or backup utility.  It is time to get control of these ludicrous numbers of files.  If your hard disk has any more than 100,000 files on it, you have a SERIOUS backup problem.
davidt67Author Commented:
I think we all appreciate that small files to tape is not the ideal.
That's why I am asking about flash snapshot and mirroring options.
Your answer singular failed to address these points.  

In the real world asking a major software company to redesign their application isn't really a runner...  Nor is performing manual or scripted zip routines a solid commercial solution.  

Informed views on utilising Symantec products such as Storage Foundation for flash snapshots and mirroring is what I asked for and I still seem to be waiting for an answer which addresses the question. cvsadmin has got closest so far...
day_landerConnect With a Mentor Commented:
BE doesn't have the capability of image backup so even if you snapshot or clone the data on your array you'll still have to backup individual files. You could use  the advanced disk based option to get a synthetic full backup tape from a disk based previous full backup plus incrementals but you would still have to take the occasional real full backup.

If you've got the money you could use Enterprise Vault to archive them, it still allows users to access them through the regular filesystem (but you would still have the millions of small files as the pointers from filesystem to archive) or you could just allow them to be accessed through the vault through webbrowser and not put the pointer files on the filesystem. Backup of the archive wouldn't take anything near so long since Enterprise Vault puts them in .cab files. That's assuming your files are read only like simsjrg's.
davidt67Author Commented:
Does Netbackup do volume images to tape?  Ultimately at some point the files need to be archived to tape and shipped off site, even from an alternate site.  

We are already using synthetic backups were we can, the problem we find is that you have to mount the last full backup tape, to make the next full backup tape from the incrementals, not ideal and not that fast.

The small files are in fact Enterprise Vault archive files, we move them to .cab files based on age, as we don't want to sacrifice the indexing and retrieval response times.  Basically we have
a HSM system
        ONLINE                     NEARLINE            TERTIARY          TAPE
        Exchange Stores -->  EVault .DVS -->   EVault .CAB -->  BackupExec  
        0- 90 days                90days -3 years    3-7 years          7 years - Infinite

Ulitmately I think we will replace BackupExec with Netbackup on the EV Storage system.  That will allow autoretrieval from tape of the really old stuff.  I also hope to move the TERTIARY & TAPE elements to a backoffice site, just leaving the ONLINE & NEARLINE at the primary site.  

Then I guess we will use replication to the backoffice site on the NEARLINE stuff and let Netbackup at that location back it up periodically.  

Nearline is currently monthly fulls and daily weekly synthetic incrementals.  
Tertiary is monthly also but could I guess drop to quarterly or less.  

andyalderConnect With a Mentor Commented:
NetBackup does indeed do volume image backups, it's called FlashBackup and for peace of mind you can even restore individual files from it, but see for performance problems for non-raw image restores.

As to backing up EV I will check with the designers but as far as the trainers were concerned you backed it up when you got around to it, not every day. You haven't got three years of nearline archived email in a single open archive do you? I was under the impression that you closed one archive and opened another every few months and then stopped repeatedly backing up anything but the open one. The closed ones will still be nearline but nothing can be added to them. If anyone edits something in the archive then a new copy is created on Exchange and the archived version is marked as stale in altavista so as long as the index is backed up closed archives may only need backing up every 6 months.
Backing up the open vault store isn't required daily if you have safety copy set properly it doesn't delete the mail from Exchange until the store has been backed up.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.