Backup Solution for 4-6 TB of data

I have run into many situations where creative/marketing companies have 3TB or 4TB or more worth of uncompressable graphic files that need backing up and archiving. There are plenty of solutions such as NAS devices for network storage but the issue is the off-site/DR portion of the backup plan.

Does anyone have any recommendations on how to backup this amount of data off-site? I realize bandwidth isn't fast enough for nightly to a datacenter or on-line backup solution, and those are very expensive as well. We have entertained having a couple NAS devices and physically rotating them, but that can easily mess up the scheduled jobs and counts on user intervention, which isn't reliable.

Any ideas?

Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

SelfGovernConnect With a Mentor Commented:
Yep... everything to the cloud... and then how long does it take to restore your 3TB of data when your server crashes?    The point being that the goal is not "Back up my data", the goal is, "Being able to restore my data should my server crash or data be lost."

With that in mind --

To me, "Archive" means you've got files that are going to be accessed rarely or not at all in the course of normal business, but need to be available for historical purposes, legal discovery, etc.

A "Backup", on the other hand, is a process where you take working or current data and make safe copies of it, with the ability to use those copies to do restores, possibly promoting the "backup" to part of your "archive" down the road.

Here's what I think -- 4TB isn't a lot of data in the big scheme of things, I work with companies that have orders of magnitude more than that... but it's a pretty big deal without the infrastructure and plan to deal with it.  It's almost too bad you don't have 10x that data, because that's where these systems called Hierarchical Storage Management systems start to be cost-effective (or maybe you can find one targeted at smaller environments?  The idea is, you have some data online, and the rarely-accessed data on tape, but it's still accessed through the file system, buy keeping a stub and pointer to the rest of the data.)

Questions we really need to have answered:
How much of that data doesn't change?
How much is the data growing on a monthly or yearly basis?
How much data can you afford to lose (i.e., could you afford to lose a day's worth, or only an hour's worth, or ...?
How is the data stored now?   direct attach, SAN, or... ?   How many servers are involved, and how much data does each server "own"?
What is your backup window?
How are you doing your backups today?
And... what's your budget?  How much is protecting the data worth to you?

The solution that I often recommend for situations similar to yours is a D2D2T, or disk to disk to tape, solution using incremental forever and synthetic full backups.   It works like this --

- Your first backup is a full backup to disk
- After that, you only run incremental backups to disk, which means your daily backups are much smaller than otherwise
- Periodically you'll run a process to create a synthetic full backup and put it on physical tape -- this means that the backup application uses the information in its catalog to create a full backup physical tape from that collection of backup data, and the end result just as if you'd done a full backup to tape in the first place.

Because you're only backing up changed files -- depending on how and how much things change! -- your backup window and the amount of disk space required can be pretty small, even with 4TB total data.  Yet, you have the benefit of a weekly full backup set for off-site preparedness and long-time archive.  You can use a tape library with one or two LTO-5 tape drives to create the synthetic full tapes.   LTO-5 has a native capacity of 1.5TB/tape, so you should be looking at about three tapes for your full set.   If you rotate those in a traditional GFS rotation, where you keep a month's worth of weekly tapes, and a year's worth of monthly tapes, and however long of yearly tapes, you'll have a great solution.   LTO-5 also allows you to use hardware encryption in the tape drive to ensure no one can read the data unless they have the proper encryption key.   See or other vendors for libraries in this class.

Two products that have this functionality are HP's Data Protector ( ) and IBM's Tivoli Storage Manager.  You should be able to download the products and use them free for 30- or 60-days for evaluation purposes.

If you go with any kind of backup to disk (even this D2D2T solution), don't scrimp on the disks.   You'll want a fast solution to be able to support the streaming speeds you'll need to create the tape copies.
The most important thing is defining the need correctly;
if you need archiving yoru solution is different and if you nered backing up solution would be different;
One of the important parameter that you should check is change ratio of the data. If most of the data is not changing you should look at the deduplication solutions such as datadomain (it has built-in replication feature), puredisk etc. deduplication would dramaticaliy shorten your backup time and decrease your bandwith usage.

Or I would do a full backup over a weekend, and then just nightly incrementals.

Eventually, think of  a monthly full, and then daily incrementals, and keep the monthly for xx months depending on retention policies ( if needed )

You may need to do a one time archive of everything, and then store it permanently offsite, and then do incrementals. Doing an archive once or twice may be a good solution to keep the incrementals to a reasonable amount.

I hope this helps !
Never miss a deadline with

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

Assuming that you're using NAS for active data, you can use Jungle Disk as a front end to S3 for cloud storage. That's probably as cheap as you can get. A cloud backup provider would work.

You can also look at Nasuni which is a NAS like appliance as a front end to cloud storage. Take a look at their website, . They even have a simple price calculator.
My understanding of Nasuni and other services like it is that it is a NAS device that you can access now, without having to wait for restoring several TB of data. The most active files are going to be cached locally, which improves performance. You can treat it like a lower performing NAS tier, but it's faster than directly storing the data in the cloud.
Gerald ConnollyCommented:
What is the loss of this data worth?

There are lots of ways to backup/archive this data but without more details of the current configuration, current backup hardware and regime, recovery requirements, recovery time requirements, data growth, budget, etc etc

NB Its starting to look like paid consultancy! (Not me, as we 6 hours of timezone apart).

4TB a day its not too bad, its only a 50 MB/sec over a 24 hour period, so realtime replication is a possibility (50MB/sec is around 500Mbits/sec or 50% of a Gigabit pipe - definitely achievable). but its about the total amount of data and how long it will take to sync and more importantly how long it will take to restore.
DopherAuthor Commented:
Thanks everyone for the great insight! We have decided to check out a NAS capable of Amazon S3 synchronization. This should offer a cost-effective solution that will offer on-line access as well as maintain an archived copy in the cloud.
All Courses

From novice to tech pro — start learning today.