Link to home
Start Free TrialLog in
Avatar of davidt67
davidt67Flag for United Kingdom of Great Britain and Northern Ireland

asked on

Backup Exec and millions of small files

Right,
We have Symantec Backup Exec 10d, an LTO tape library, a server &l disk array and a Gigabit network.  
We have two servers each with over two million small files, across fifty thousand directories on a single NTFS volume.  The files total about 250GB on each server.  
Files are read-only with approx. 10,000 new files added every weekday.

Backup of this volume achieves a reported throughput of 200MB/Min as opposed to the over 1GB/Min we achieve on Exchange or SQL database files.  
More importantly it takes over eighteen hours to perform a full backup.  

So how do we achieve a small backup windows and faster backup?
The files need to be held off-site so currently they are written to tape, and not just another disk array.  

Using BackupExec options can will improve the performance?
Do we now need to consider flash snapshot and mirroring options to reduce the windows?
Or should we use replication to copy them to a remote office?
And would either or these help get the files onto tape more quickly?  

Answers based on actual experience using Symantec / Veritas solutions are preferred as we have alot of installed product from this software house.  



Avatar of simsjrg
simsjrg
Flag of United States of America image

I have a similar problem... about 2TB of 100k - 1MB TIFF files. Weekly backup takes about 48 and requires 2 drives with a maximum throughput of about 600MB/Min and an average of about 375MB/Min. This is using 2 SDLT 600 drives and BE 10d. The problem we have in common is the fact that there are so many small files in such large folder structure which is the bottleneck. It is not a hardware or software issue in this case. Using the same hardware I achieve about 3GB/Min on SQL or Exchange.

I would say if your main goal was to archive then you may have a problem. But if you are looking to potentially replicate or keep an up to date copy in an off-site then you may consider the following.

Depending on the physical location of this offsite, the size of this daily 10,000 file daily update, your availability and probably a few other factors I just can't think of at the moment...

Weekly Backups (Friday, Saturday or Sunday) to allow for enough time for the backup to complete successfully with daily differential or incremental backups that you can bring to the remote location and restore to keep and up to date copy else where. If restore time is an issue a differential may be a better idea as you only need the latest full backup and the latest differential backup to get you up to date as opposed to the full backup and all incremental jobs to get you back up again.

This will allow you to keep backup times mid week to a minimum.

I will check back in a bit to look for some comments from you as well as additional input.
ASKER CERTIFIED SOLUTION
Avatar of cvsadmin
cvsadmin

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
cvsadmin: Thanks for the input however in my case all this data already resides on our SAN and the backups are really only needed in the event of multiple failures and or natural disasters.

To lose the data on the SAN the a combination of the following must happen:
Complete DAE failure
2 failed disks in that array with failure of DAE hot spare as well as 4 additional global hot spares.
Fire, Flood etc...

This data is also available at our DR colo and is never more then 12 hours behind in the event of TOTAL failure.

Basically anything is possible if you have the available funding.

Thanks again


Avatar of davidt67

ASKER

Thanks for the feedback thus far.

Question, is there any reason to think NetBackup would perform any better than Backup Exec in this scenario?

Additional Info.
RPO      close of previous business day is acceptable.
RTO      Less than 12 hours is the target. Thus the problem with the backup and corresponding restore times.

Disk Arrays are 2x RAID5+HS across seven disks. 15K RPM SCSI disks
Not obviously a bottleneck, certainly less thrashing than when we do an array rebuild or expansion.
 
Source server processor runs at less than 20% during backup and BEX agent only uses one processor from the four available. network, target server and tape certainly not taxed, they are mostly waiting on the source-side processing.

For one application, consolidating files into .cabs is an option for the other it's not.  

Running a compression routine prior to backup, would just seem to exasparate the problem of the limited backup window available, but certainly would improve tape streaming and restore time.  Fortunately should a restore be required it would be the whole array, not individual files.  

What are the upper limits on creating a .zip file?  Presumably this is no faster than Veritas creating the volume snapshot.  

I am thinking we should utilise Veritas technologies to snapshot mirror the volumes to a seperate array, then replicate the volumes offsite, and then stream them to tape.  
NO, NO, and NO.  NO tape backup software will handle 10,000 directories and 1,000,000 files in anything less than about a day.  You need to stop creating these zillions of tiny files.  The app that is making them is SERIOUSLY in error, it is badly conceived to make this many small files, the developers should have realized this would create a backup nightmare.  They probably should be out of a job by now....

At this point, your best bet is to ZIP the old files into an archive.  You can stuff a thousand tiny files into a ZIP archive, and it will copy in 1-2 seconds, whereas the 1,000 tiny files will take 5-10 minutes on ANY file copy or backup utility.  It is time to get control of these ludicrous numbers of files.  If your hard disk has any more than 100,000 files on it, you have a SERIOUS backup problem.
Boy,
I think we all appreciate that small files to tape is not the ideal.
That's why I am asking about flash snapshot and mirroring options.
Your answer singular failed to address these points.  

In the real world asking a major software company to redesign their application isn't really a runner...  Nor is performing manual or scripted zip routines a solid commercial solution.  

Informed views on utilising Symantec products such as Storage Foundation for flash snapshots and mirroring is what I asked for and I still seem to be waiting for an answer which addresses the question. cvsadmin has got closest so far...
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hmm,
Does Netbackup do volume images to tape?  Ultimately at some point the files need to be archived to tape and shipped off site, even from an alternate site.  

We are already using synthetic backups were we can, the problem we find is that you have to mount the last full backup tape, to make the next full backup tape from the incrementals, not ideal and not that fast.

The small files are in fact Enterprise Vault archive files, we move them to .cab files based on age, as we don't want to sacrifice the indexing and retrieval response times.  Basically we have
a HSM system
        ONLINE                     NEARLINE            TERTIARY          TAPE
        Exchange Stores -->  EVault .DVS -->   EVault .CAB -->  BackupExec  
        0- 90 days                90days -3 years    3-7 years          7 years - Infinite

Ulitmately I think we will replace BackupExec with Netbackup on the EV Storage system.  That will allow autoretrieval from tape of the really old stuff.  I also hope to move the TERTIARY & TAPE elements to a backoffice site, just leaving the ONLINE & NEARLINE at the primary site.  

Then I guess we will use replication to the backoffice site on the NEARLINE stuff and let Netbackup at that location back it up periodically.  

BACKUP CYCLE
Nearline is currently monthly fulls and daily weekly synthetic incrementals.  
Tertiary is monthly also but could I guess drop to quarterly or less.  

SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Member_2_231077
Member_2_231077

Backing up the open vault store isn't required daily if you have safety copy set properly it doesn't delete the mail from Exchange until the store has been backed up.