Solved

Backup Exec and millions of small files

Posted on 2006-06-29
10
4,545 Views
Last Modified: 2013-12-01
Right,
We have Symantec Backup Exec 10d, an LTO tape library, a server &l disk array and a Gigabit network.  
We have two servers each with over two million small files, across fifty thousand directories on a single NTFS volume.  The files total about 250GB on each server.  
Files are read-only with approx. 10,000 new files added every weekday.

Backup of this volume achieves a reported throughput of 200MB/Min as opposed to the over 1GB/Min we achieve on Exchange or SQL database files.  
More importantly it takes over eighteen hours to perform a full backup.  

So how do we achieve a small backup windows and faster backup?
The files need to be held off-site so currently they are written to tape, and not just another disk array.  

Using BackupExec options can will improve the performance?
Do we now need to consider flash snapshot and mirroring options to reduce the windows?
Or should we use replication to copy them to a remote office?
And would either or these help get the files onto tape more quickly?  

Answers based on actual experience using Symantec / Veritas solutions are preferred as we have alot of installed product from this software house.  



0
Comment
Question by:davidt67
  • 3
  • 2
  • 2
  • +3
10 Comments
 
LVL 18

Expert Comment

by:simsjrg
ID: 17009197
I have a similar problem... about 2TB of 100k - 1MB TIFF files. Weekly backup takes about 48 and requires 2 drives with a maximum throughput of about 600MB/Min and an average of about 375MB/Min. This is using 2 SDLT 600 drives and BE 10d. The problem we have in common is the fact that there are so many small files in such large folder structure which is the bottleneck. It is not a hardware or software issue in this case. Using the same hardware I achieve about 3GB/Min on SQL or Exchange.

I would say if your main goal was to archive then you may have a problem. But if you are looking to potentially replicate or keep an up to date copy in an off-site then you may consider the following.

Depending on the physical location of this offsite, the size of this daily 10,000 file daily update, your availability and probably a few other factors I just can't think of at the moment...

Weekly Backups (Friday, Saturday or Sunday) to allow for enough time for the backup to complete successfully with daily differential or incremental backups that you can bring to the remote location and restore to keep and up to date copy else where. If restore time is an issue a differential may be a better idea as you only need the latest full backup and the latest differential backup to get you up to date as opposed to the full backup and all incremental jobs to get you back up again.

This will allow you to keep backup times mid week to a minimum.

I will check back in a bit to look for some comments from you as well as additional input.
0
 
LVL 2

Accepted Solution

by:
cvsadmin earned 300 total points
ID: 17010440
Here are my thoughts. for David

In most cases you will be running on a small array, i suspect that the disks are running full out to provide the maximum throughput to your tape device, you are most likely going to have to add more platters to your raid array in order to achieve additional throughput. You could try and use winrar to zip them to an archive.

The reason the exchange backup is so fast is that its reading the data in large chunks, most likely the maximum for the unit due to the single file nature of exchange.
Your small files are being read in ones and twos, for example, more disk speed/disks required in that case. Thus my sugestion to zip them or rar them to a single file and then push it to tape, most likely would give you the speed you need.

Here are my thoughts for sims.
You are already running a good raid array, i dont think that additional disks will help you, how many disk are you running? 4-6? your small file issue may be the tape drive having to write the small chunks. Sorry i cant give you more information.

Here are my thoughs for both of you.
First you guys are backing up 500gb to 2TB+ and increase your files daily, you will eventualy run into some issues regarding growth of the tape backup solution, sims you are almost there....
In the past i have built a 7TB raid array with the promise vtrack15100 and 15 400 gig drives, setup suresync and or replistore to copy the data to an alternate location, but this is somewhat tricky.
You need both arrays in the same room, best to robocopy the data to the backup array a couple times to make sure you have it all, transport your backup array to your new location, then let the software do a bitcopy check against all the files, it will queue any new files to send over. This is the only way to reduce your bandwidth usage, this works over adsl and cable modem connections.

Regards,
0
 
LVL 18

Expert Comment

by:simsjrg
ID: 17010547
cvsadmin: Thanks for the input however in my case all this data already resides on our SAN and the backups are really only needed in the event of multiple failures and or natural disasters.

To lose the data on the SAN the a combination of the following must happen:
Complete DAE failure
2 failed disks in that array with failure of DAE hot spare as well as 4 additional global hot spares.
Fire, Flood etc...

This data is also available at our DR colo and is never more then 12 hours behind in the event of TOTAL failure.

Basically anything is possible if you have the available funding.

Thanks again


0
 
LVL 3

Author Comment

by:davidt67
ID: 17012806
Thanks for the feedback thus far.

Question, is there any reason to think NetBackup would perform any better than Backup Exec in this scenario?

Additional Info.
RPO      close of previous business day is acceptable.
RTO      Less than 12 hours is the target. Thus the problem with the backup and corresponding restore times.

Disk Arrays are 2x RAID5+HS across seven disks. 15K RPM SCSI disks
Not obviously a bottleneck, certainly less thrashing than when we do an array rebuild or expansion.
 
Source server processor runs at less than 20% during backup and BEX agent only uses one processor from the four available. network, target server and tape certainly not taxed, they are mostly waiting on the source-side processing.

For one application, consolidating files into .cabs is an option for the other it's not.  

Running a compression routine prior to backup, would just seem to exasparate the problem of the limited backup window available, but certainly would improve tape streaming and restore time.  Fortunately should a restore be required it would be the whole array, not individual files.  

What are the upper limits on creating a .zip file?  Presumably this is no faster than Veritas creating the volume snapshot.  

I am thinking we should utilise Veritas technologies to snapshot mirror the volumes to a seperate array, then replicate the volumes offsite, and then stream them to tape.  
0
 
LVL 44

Expert Comment

by:scrathcyboy
ID: 17022361
NO, NO, and NO.  NO tape backup software will handle 10,000 directories and 1,000,000 files in anything less than about a day.  You need to stop creating these zillions of tiny files.  The app that is making them is SERIOUSLY in error, it is badly conceived to make this many small files, the developers should have realized this would create a backup nightmare.  They probably should be out of a job by now....

At this point, your best bet is to ZIP the old files into an archive.  You can stuff a thousand tiny files into a ZIP archive, and it will copy in 1-2 seconds, whereas the 1,000 tiny files will take 5-10 minutes on ANY file copy or backup utility.  It is time to get control of these ludicrous numbers of files.  If your hard disk has any more than 100,000 files on it, you have a SERIOUS backup problem.
0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 
LVL 3

Author Comment

by:davidt67
ID: 17022990
Boy,
I think we all appreciate that small files to tape is not the ideal.
That's why I am asking about flash snapshot and mirroring options.
Your answer singular failed to address these points.  

In the real world asking a major software company to redesign their application isn't really a runner...  Nor is performing manual or scripted zip routines a solid commercial solution.  

Informed views on utilising Symantec products such as Storage Foundation for flash snapshots and mirroring is what I asked for and I still seem to be waiting for an answer which addresses the question. cvsadmin has got closest so far...
0
 

Assisted Solution

by:day_lander
day_lander earned 100 total points
ID: 17023526
BE doesn't have the capability of image backup so even if you snapshot or clone the data on your array you'll still have to backup individual files. You could use  the advanced disk based option to get a synthetic full backup tape from a disk based previous full backup plus incrementals but you would still have to take the occasional real full backup.

If you've got the money you could use Enterprise Vault to archive them, it still allows users to access them through the regular filesystem (but you would still have the millions of small files as the pointers from filesystem to archive) or you could just allow them to be accessed through the vault through webbrowser and not put the pointer files on the filesystem. Backup of the archive wouldn't take anything near so long since Enterprise Vault puts them in .cab files. That's assuming your files are read only like simsjrg's.
0
 
LVL 3

Author Comment

by:davidt67
ID: 17023735
Hmm,
Does Netbackup do volume images to tape?  Ultimately at some point the files need to be archived to tape and shipped off site, even from an alternate site.  

We are already using synthetic backups were we can, the problem we find is that you have to mount the last full backup tape, to make the next full backup tape from the incrementals, not ideal and not that fast.

The small files are in fact Enterprise Vault archive files, we move them to .cab files based on age, as we don't want to sacrifice the indexing and retrieval response times.  Basically we have
a HSM system
        ONLINE                     NEARLINE            TERTIARY          TAPE
        Exchange Stores -->  EVault .DVS -->   EVault .CAB -->  BackupExec  
        0- 90 days                90days -3 years    3-7 years          7 years - Infinite

Ulitmately I think we will replace BackupExec with Netbackup on the EV Storage system.  That will allow autoretrieval from tape of the really old stuff.  I also hope to move the TERTIARY & TAPE elements to a backoffice site, just leaving the ONLINE & NEARLINE at the primary site.  

Then I guess we will use replication to the backoffice site on the NEARLINE stuff and let Netbackup at that location back it up periodically.  

BACKUP CYCLE
Nearline is currently monthly fulls and daily weekly synthetic incrementals.  
Tertiary is monthly also but could I guess drop to quarterly or less.  

0
 
LVL 55

Assisted Solution

by:andyalder
andyalder earned 100 total points
ID: 17027218
NetBackup does indeed do volume image backups, it's called FlashBackup and for peace of mind you can even restore individual files from it, but see http://seer.support.veritas.com/docs/279212.htm for performance problems for non-raw image restores.

As to backing up EV I will check with the designers but as far as the trainers were concerned you backed it up when you got around to it, not every day. You haven't got three years of nearline archived email in a single open archive do you? I was under the impression that you closed one archive and opened another every few months and then stopped repeatedly backing up anything but the open one. The closed ones will still be nearline but nothing can be added to them. If anyone edits something in the archive then a new copy is created on Exchange and the archived version is marked as stale in altavista so as long as the index is backed up closed archives may only need backing up every 6 months.
0
 
LVL 55

Expert Comment

by:andyalder
ID: 17043804
Backing up the open vault store isn't required daily if you have safety copy set properly it doesn't delete the mail from Exchange until the store has been backed up.
0

Featured Post

Give your grad a cloud of their own!

With up to 8TB of storage, give your favorite graduate their own personal cloud to centralize all their photos, videos and music in one safe place. They can save, sync and share all their stuff, and automatic photo backup helps free up space on their smartphone and tablet.

Join & Write a Comment

Lets start to have a small explanation what is VAAI(vStorage API for Array Integration ) and what are the benefits using it. VAAI is an API framework in VMware that enable some Storage tasks. It first presented in ESXi 4.1, but only after 5.x sup…
Workplace bullying has increased with the use of email and social media. Retain evidence of this with email archiving to protect your employees.
This tutorial will walk an individual through locating and launching the BEUtility application to properly change the service account username and\or password in situation where it may be necessary or where the password has been inadvertently change…
This tutorial will walk an individual through the process of configuring basic necessities in order to use the 2010 version of Data Protection Manager. These include storage, agents, and protection jobs. Launch Data Protection Manager from the deskt…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now