Backup Solution for 4-6 TB of data

Posted on 2010-11-18
Last Modified: 2012-06-21
I have run into many situations where creative/marketing companies have 3TB or 4TB or more worth of uncompressable graphic files that need backing up and archiving. There are plenty of solutions such as NAS devices for network storage but the issue is the off-site/DR portion of the backup plan.

Does anyone have any recommendations on how to backup this amount of data off-site? I realize bandwidth isn't fast enough for nightly to a datacenter or on-line backup solution, and those are very expensive as well. We have entertained having a couple NAS devices and physically rotating them, but that can easily mess up the scheduled jobs and counts on user intervention, which isn't reliable.

Any ideas?

Question by:Dopher
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Expert Comment

ID: 34164503
The most important thing is defining the need correctly;
if you need archiving yoru solution is different and if you nered backing up solution would be different;
One of the important parameter that you should check is change ratio of the data. If most of the data is not changing you should look at the deduplication solutions such as datadomain (it has built-in replication feature), puredisk etc. deduplication would dramaticaliy shorten your backup time and decrease your bandwith usage.

LVL 63

Expert Comment

ID: 34164838
Or I would do a full backup over a weekend, and then just nightly incrementals.

Eventually, think of  a monthly full, and then daily incrementals, and keep the monthly for xx months depending on retention policies ( if needed )

You may need to do a one time archive of everything, and then store it permanently offsite, and then do incrementals. Doing an archive once or twice may be a good solution to keep the incrementals to a reasonable amount.

I hope this helps !
LVL 42

Expert Comment

ID: 34169798
Assuming that you're using NAS for active data, you can use Jungle Disk as a front end to S3 for cloud storage. That's probably as cheap as you can get. A cloud backup provider would work.

You can also look at Nasuni which is a NAS like appliance as a front end to cloud storage. Take a look at their website, . They even have a simple price calculator.
Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

LVL 21

Accepted Solution

SelfGovern earned 250 total points
ID: 34172424
Yep... everything to the cloud... and then how long does it take to restore your 3TB of data when your server crashes?    The point being that the goal is not "Back up my data", the goal is, "Being able to restore my data should my server crash or data be lost."

With that in mind --

To me, "Archive" means you've got files that are going to be accessed rarely or not at all in the course of normal business, but need to be available for historical purposes, legal discovery, etc.

A "Backup", on the other hand, is a process where you take working or current data and make safe copies of it, with the ability to use those copies to do restores, possibly promoting the "backup" to part of your "archive" down the road.

Here's what I think -- 4TB isn't a lot of data in the big scheme of things, I work with companies that have orders of magnitude more than that... but it's a pretty big deal without the infrastructure and plan to deal with it.  It's almost too bad you don't have 10x that data, because that's where these systems called Hierarchical Storage Management systems start to be cost-effective (or maybe you can find one targeted at smaller environments?  The idea is, you have some data online, and the rarely-accessed data on tape, but it's still accessed through the file system, buy keeping a stub and pointer to the rest of the data.)

Questions we really need to have answered:
How much of that data doesn't change?
How much is the data growing on a monthly or yearly basis?
How much data can you afford to lose (i.e., could you afford to lose a day's worth, or only an hour's worth, or ...?
How is the data stored now?   direct attach, SAN, or... ?   How many servers are involved, and how much data does each server "own"?
What is your backup window?
How are you doing your backups today?
And... what's your budget?  How much is protecting the data worth to you?

The solution that I often recommend for situations similar to yours is a D2D2T, or disk to disk to tape, solution using incremental forever and synthetic full backups.   It works like this --

- Your first backup is a full backup to disk
- After that, you only run incremental backups to disk, which means your daily backups are much smaller than otherwise
- Periodically you'll run a process to create a synthetic full backup and put it on physical tape -- this means that the backup application uses the information in its catalog to create a full backup physical tape from that collection of backup data, and the end result just as if you'd done a full backup to tape in the first place.

Because you're only backing up changed files -- depending on how and how much things change! -- your backup window and the amount of disk space required can be pretty small, even with 4TB total data.  Yet, you have the benefit of a weekly full backup set for off-site preparedness and long-time archive.  You can use a tape library with one or two LTO-5 tape drives to create the synthetic full tapes.   LTO-5 has a native capacity of 1.5TB/tape, so you should be looking at about three tapes for your full set.   If you rotate those in a traditional GFS rotation, where you keep a month's worth of weekly tapes, and a year's worth of monthly tapes, and however long of yearly tapes, you'll have a great solution.   LTO-5 also allows you to use hardware encryption in the tape drive to ensure no one can read the data unless they have the proper encryption key.   See or other vendors for libraries in this class.

Two products that have this functionality are HP's Data Protector ( ) and IBM's Tivoli Storage Manager.  You should be able to download the products and use them free for 30- or 60-days for evaluation purposes.

If you go with any kind of backup to disk (even this D2D2T solution), don't scrimp on the disks.   You'll want a fast solution to be able to support the streaming speeds you'll need to create the tape copies.
LVL 42

Expert Comment

ID: 34174275
My understanding of Nasuni and other services like it is that it is a NAS device that you can access now, without having to wait for restoring several TB of data. The most active files are going to be cached locally, which improves performance. You can treat it like a lower performing NAS tier, but it's faster than directly storing the data in the cloud.
LVL 17

Expert Comment

by:Gerald Connolly
ID: 34177273
What is the loss of this data worth?

There are lots of ways to backup/archive this data but without more details of the current configuration, current backup hardware and regime, recovery requirements, recovery time requirements, data growth, budget, etc etc

NB Its starting to look like paid consultancy! (Not me, as we 6 hours of timezone apart).

4TB a day its not too bad, its only a 50 MB/sec over a 24 hour period, so realtime replication is a possibility (50MB/sec is around 500Mbits/sec or 50% of a Gigabit pipe - definitely achievable). but its about the total amount of data and how long it will take to sync and more importantly how long it will take to restore.

Author Closing Comment

ID: 34183803
Thanks everyone for the great insight! We have decided to check out a NAS capable of Amazon S3 synchronization. This should offer a cost-effective solution that will offer on-line access as well as maintain an archived copy in the cloud.

Featured Post

Optimize your web performance

What's in the eBook?
- Full list of reasons for poor performance
- Ultimate measures to speed things up
- Primary web monitoring types
- KPIs you should be monitoring in order to increase your ROI

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Many businesses neglect disaster recovery and treat it as an after-thought. I can tell you first hand that data will be lost, hard drives die, servers will be hacked, and careless (or malicious) employees can ruin your data.
When speed and performance are vital to revenue, companies must have complete confidence in their cloud environment.
This tutorial will walk an individual through locating and launching the BEUtility application to properly change the service account username and\or password in situation where it may be necessary or where the password has been inadvertently change…
Two types of users will appreciate AOMEI Backupper Pro: 1 - Those with PCIe drives (and haven't found cloning software that works on them). 2 - Those who want a fast clone of their boot drive (no re-boots needed) and it can clone your drive wh…

617 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question