Go Premium for a chance to win a PS4. Enter to Win


Backup Solution for 4-6 TB of data

Posted on 2010-11-18
Medium Priority
Last Modified: 2012-06-21
I have run into many situations where creative/marketing companies have 3TB or 4TB or more worth of uncompressable graphic files that need backing up and archiving. There are plenty of solutions such as NAS devices for network storage but the issue is the off-site/DR portion of the backup plan.

Does anyone have any recommendations on how to backup this amount of data off-site? I realize bandwidth isn't fast enough for nightly to a datacenter or on-line backup solution, and those are very expensive as well. We have entertained having a couple NAS devices and physically rotating them, but that can easily mess up the scheduled jobs and counts on user intervention, which isn't reliable.

Any ideas?

Question by:Dopher

Expert Comment

ID: 34164503
The most important thing is defining the need correctly;
if you need archiving yoru solution is different and if you nered backing up solution would be different;
One of the important parameter that you should check is change ratio of the data. If most of the data is not changing you should look at the deduplication solutions such as datadomain (it has built-in replication feature), puredisk etc. deduplication would dramaticaliy shorten your backup time and decrease your bandwith usage.

LVL 63

Expert Comment

ID: 34164838
Or I would do a full backup over a weekend, and then just nightly incrementals.

Eventually, think of  a monthly full, and then daily incrementals, and keep the monthly for xx months depending on retention policies ( if needed )

You may need to do a one time archive of everything, and then store it permanently offsite, and then do incrementals. Doing an archive once or twice may be a good solution to keep the incrementals to a reasonable amount.

I hope this helps !
LVL 42

Expert Comment

ID: 34169798
Assuming that you're using NAS for active data, you can use Jungle Disk as a front end to S3 for cloud storage. That's probably as cheap as you can get. A cloud backup provider would work.

You can also look at Nasuni which is a NAS like appliance as a front end to cloud storage. Take a look at their website, www.nasuni.com . They even have a simple price calculator.
Veeam Task Manager for Hyper-V

Task Manager for Hyper-V provides critical information that allows you to monitor Hyper-V performance by displaying real-time views of CPU and memory at the individual VM-level, so you can quickly identify which VMs are using host resources.

LVL 21

Accepted Solution

SelfGovern earned 1000 total points
ID: 34172424
Yep... everything to the cloud... and then how long does it take to restore your 3TB of data when your server crashes?    The point being that the goal is not "Back up my data", the goal is, "Being able to restore my data should my server crash or data be lost."

With that in mind --

To me, "Archive" means you've got files that are going to be accessed rarely or not at all in the course of normal business, but need to be available for historical purposes, legal discovery, etc.

A "Backup", on the other hand, is a process where you take working or current data and make safe copies of it, with the ability to use those copies to do restores, possibly promoting the "backup" to part of your "archive" down the road.

Here's what I think -- 4TB isn't a lot of data in the big scheme of things, I work with companies that have orders of magnitude more than that... but it's a pretty big deal without the infrastructure and plan to deal with it.  It's almost too bad you don't have 10x that data, because that's where these systems called Hierarchical Storage Management systems start to be cost-effective (or maybe you can find one targeted at smaller environments?  The idea is, you have some data online, and the rarely-accessed data on tape, but it's still accessed through the file system, buy keeping a stub and pointer to the rest of the data.)

Questions we really need to have answered:
How much of that data doesn't change?
How much is the data growing on a monthly or yearly basis?
How much data can you afford to lose (i.e., could you afford to lose a day's worth, or only an hour's worth, or ...?
How is the data stored now?   direct attach, SAN, or... ?   How many servers are involved, and how much data does each server "own"?
What is your backup window?
How are you doing your backups today?
And... what's your budget?  How much is protecting the data worth to you?

The solution that I often recommend for situations similar to yours is a D2D2T, or disk to disk to tape, solution using incremental forever and synthetic full backups.   It works like this --

- Your first backup is a full backup to disk
- After that, you only run incremental backups to disk, which means your daily backups are much smaller than otherwise
- Periodically you'll run a process to create a synthetic full backup and put it on physical tape -- this means that the backup application uses the information in its catalog to create a full backup physical tape from that collection of backup data, and the end result just as if you'd done a full backup to tape in the first place.

Because you're only backing up changed files -- depending on how and how much things change! -- your backup window and the amount of disk space required can be pretty small, even with 4TB total data.  Yet, you have the benefit of a weekly full backup set for off-site preparedness and long-time archive.  You can use a tape library with one or two LTO-5 tape drives to create the synthetic full tapes.   LTO-5 has a native capacity of 1.5TB/tape, so you should be looking at about three tapes for your full set.   If you rotate those in a traditional GFS rotation, where you keep a month's worth of weekly tapes, and a year's worth of monthly tapes, and however long of yearly tapes, you'll have a great solution.   LTO-5 also allows you to use hardware encryption in the tape drive to ensure no one can read the data unless they have the proper encryption key.   See http://www.hp.com/go/msl or other vendors for libraries in this class.

Two products that have this functionality are HP's Data Protector ( http://www.hp.com/go/dataprotector ) and IBM's Tivoli Storage Manager.  You should be able to download the products and use them free for 30- or 60-days for evaluation purposes.

If you go with any kind of backup to disk (even this D2D2T solution), don't scrimp on the disks.   You'll want a fast solution to be able to support the streaming speeds you'll need to create the tape copies.
LVL 42

Expert Comment

ID: 34174275
My understanding of Nasuni and other services like it is that it is a NAS device that you can access now, without having to wait for restoring several TB of data. The most active files are going to be cached locally, which improves performance. You can treat it like a lower performing NAS tier, but it's faster than directly storing the data in the cloud.
LVL 17

Expert Comment

by:Gerald Connolly
ID: 34177273
What is the loss of this data worth?

There are lots of ways to backup/archive this data but without more details of the current configuration, current backup hardware and regime, recovery requirements, recovery time requirements, data growth, budget, etc etc

NB Its starting to look like paid consultancy! (Not me, as we 6 hours of timezone apart).

4TB a day its not too bad, its only a 50 MB/sec over a 24 hour period, so realtime replication is a possibility (50MB/sec is around 500Mbits/sec or 50% of a Gigabit pipe - definitely achievable). but its about the total amount of data and how long it will take to sync and more importantly how long it will take to restore.

Author Closing Comment

ID: 34183803
Thanks everyone for the great insight! We have decided to check out a NAS capable of Amazon S3 synchronization. This should offer a cost-effective solution that will offer on-line access as well as maintain an archived copy in the cloud.

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Your data is at risk. Probably more today that at any other time in history. There are simply more people with more access to the Web with bad intentions.
Article by: evilrix
Looking for a way to avoid searching through large data sets for data that doesn't exist? A Bloom Filter might be what you need. This data structure is a probabilistic filter that allows you to avoid unnecessary searches when you know the data defin…
This tutorial will walk an individual through the steps necessary to configure their installation of BackupExec 2012 to use network shared disk space. Verify that the path to the shared storage is valid and that data can be written to that location:…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…
Suggested Courses

824 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question