Backup Solution for 4-6 TB of data

Posted on 2010-11-18
Last Modified: 2012-06-21
I have run into many situations where creative/marketing companies have 3TB or 4TB or more worth of uncompressable graphic files that need backing up and archiving. There are plenty of solutions such as NAS devices for network storage but the issue is the off-site/DR portion of the backup plan.

Does anyone have any recommendations on how to backup this amount of data off-site? I realize bandwidth isn't fast enough for nightly to a datacenter or on-line backup solution, and those are very expensive as well. We have entertained having a couple NAS devices and physically rotating them, but that can easily mess up the scheduled jobs and counts on user intervention, which isn't reliable.

Any ideas?

Question by:Dopher

Expert Comment

ID: 34164503
The most important thing is defining the need correctly;
if you need archiving yoru solution is different and if you nered backing up solution would be different;
One of the important parameter that you should check is change ratio of the data. If most of the data is not changing you should look at the deduplication solutions such as datadomain (it has built-in replication feature), puredisk etc. deduplication would dramaticaliy shorten your backup time and decrease your bandwith usage.

LVL 63

Expert Comment

ID: 34164838
Or I would do a full backup over a weekend, and then just nightly incrementals.

Eventually, think of  a monthly full, and then daily incrementals, and keep the monthly for xx months depending on retention policies ( if needed )

You may need to do a one time archive of everything, and then store it permanently offsite, and then do incrementals. Doing an archive once or twice may be a good solution to keep the incrementals to a reasonable amount.

I hope this helps !
LVL 42

Expert Comment

ID: 34169798
Assuming that you're using NAS for active data, you can use Jungle Disk as a front end to S3 for cloud storage. That's probably as cheap as you can get. A cloud backup provider would work.

You can also look at Nasuni which is a NAS like appliance as a front end to cloud storage. Take a look at their website, . They even have a simple price calculator.
What is SQL Server and how does it work?

The purpose of this paper is to provide you background on SQL Server. It’s your self-study guide for learning fundamentals. It includes both the history of SQL and its technical basics. Concepts and definitions will form the solid foundation of your future DBA expertise.

LVL 20

Accepted Solution

SelfGovern earned 250 total points
ID: 34172424
Yep... everything to the cloud... and then how long does it take to restore your 3TB of data when your server crashes?    The point being that the goal is not "Back up my data", the goal is, "Being able to restore my data should my server crash or data be lost."

With that in mind --

To me, "Archive" means you've got files that are going to be accessed rarely or not at all in the course of normal business, but need to be available for historical purposes, legal discovery, etc.

A "Backup", on the other hand, is a process where you take working or current data and make safe copies of it, with the ability to use those copies to do restores, possibly promoting the "backup" to part of your "archive" down the road.

Here's what I think -- 4TB isn't a lot of data in the big scheme of things, I work with companies that have orders of magnitude more than that... but it's a pretty big deal without the infrastructure and plan to deal with it.  It's almost too bad you don't have 10x that data, because that's where these systems called Hierarchical Storage Management systems start to be cost-effective (or maybe you can find one targeted at smaller environments?  The idea is, you have some data online, and the rarely-accessed data on tape, but it's still accessed through the file system, buy keeping a stub and pointer to the rest of the data.)

Questions we really need to have answered:
How much of that data doesn't change?
How much is the data growing on a monthly or yearly basis?
How much data can you afford to lose (i.e., could you afford to lose a day's worth, or only an hour's worth, or ...?
How is the data stored now?   direct attach, SAN, or... ?   How many servers are involved, and how much data does each server "own"?
What is your backup window?
How are you doing your backups today?
And... what's your budget?  How much is protecting the data worth to you?

The solution that I often recommend for situations similar to yours is a D2D2T, or disk to disk to tape, solution using incremental forever and synthetic full backups.   It works like this --

- Your first backup is a full backup to disk
- After that, you only run incremental backups to disk, which means your daily backups are much smaller than otherwise
- Periodically you'll run a process to create a synthetic full backup and put it on physical tape -- this means that the backup application uses the information in its catalog to create a full backup physical tape from that collection of backup data, and the end result just as if you'd done a full backup to tape in the first place.

Because you're only backing up changed files -- depending on how and how much things change! -- your backup window and the amount of disk space required can be pretty small, even with 4TB total data.  Yet, you have the benefit of a weekly full backup set for off-site preparedness and long-time archive.  You can use a tape library with one or two LTO-5 tape drives to create the synthetic full tapes.   LTO-5 has a native capacity of 1.5TB/tape, so you should be looking at about three tapes for your full set.   If you rotate those in a traditional GFS rotation, where you keep a month's worth of weekly tapes, and a year's worth of monthly tapes, and however long of yearly tapes, you'll have a great solution.   LTO-5 also allows you to use hardware encryption in the tape drive to ensure no one can read the data unless they have the proper encryption key.   See or other vendors for libraries in this class.

Two products that have this functionality are HP's Data Protector ( ) and IBM's Tivoli Storage Manager.  You should be able to download the products and use them free for 30- or 60-days for evaluation purposes.

If you go with any kind of backup to disk (even this D2D2T solution), don't scrimp on the disks.   You'll want a fast solution to be able to support the streaming speeds you'll need to create the tape copies.
LVL 42

Expert Comment

ID: 34174275
My understanding of Nasuni and other services like it is that it is a NAS device that you can access now, without having to wait for restoring several TB of data. The most active files are going to be cached locally, which improves performance. You can treat it like a lower performing NAS tier, but it's faster than directly storing the data in the cloud.
LVL 16

Expert Comment

by:Gerald Connolly
ID: 34177273
What is the loss of this data worth?

There are lots of ways to backup/archive this data but without more details of the current configuration, current backup hardware and regime, recovery requirements, recovery time requirements, data growth, budget, etc etc

NB Its starting to look like paid consultancy! (Not me, as we 6 hours of timezone apart).

4TB a day its not too bad, its only a 50 MB/sec over a 24 hour period, so realtime replication is a possibility (50MB/sec is around 500Mbits/sec or 50% of a Gigabit pipe - definitely achievable). but its about the total amount of data and how long it will take to sync and more importantly how long it will take to restore.

Author Closing Comment

ID: 34183803
Thanks everyone for the great insight! We have decided to check out a NAS capable of Amazon S3 synchronization. This should offer a cost-effective solution that will offer on-line access as well as maintain an archived copy in the cloud.

Featured Post

Optimizing Cloud Backup for Low Bandwidth

With cloud storage prices going down a growing number of SMBs start to use it for backup storage. Unfortunately, business data volume rarely fits the average Internet speed. This article provides an overview of main Internet speed challenges and reveals backup best practices.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you ever consider purchasing any Daossoft Software Products, DON'T expect any meaningful support - This article should convince you why!
Facing problems with you memory card? Cannot access your memory card? All stored data, images, videos are lost? If these are your questions...than this small article might help you out in retrieving your lost or inaccessible data.
In this Micro Tutorial viewers will learn how to restore single file or folder from Bare Metal backup image of their system. Tutorial shows how to restore files and folders from system backup. Often it is not needed to restore entire system when onl…
This tutorial will walk an individual through the steps necessary to configure their installation of BackupExec 2012 to use network shared disk space. Verify that the path to the shared storage is valid and that data can be written to that location:…

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question