Solved

Archive all Changes to Files for 100 years

Posted on 2010-08-12
27
452 Views
Last Modified: 2013-11-14
I have 310 GBs of TIFF and JPEG images and a 2 GB sql Database that I need to preserve. I need to make a full backup and then I need to record all changes (overwrite, delete, modify, add, etc...) and keep all revisions of a file forever (well only 100 years, but might as well be forever).

So, If a user deletes or overwrites a file, or changes a file and I don't notice for 20 years, I need to be able to view/recover the deleted/modified file. I would prefer a solution that does not use tape.

Dell / CommVault suggested a server with CommVault software and 4 TB's of disk space for $22,000. I am hoping for some suggested software that will help manage and accomplish my goal with a more reasonable cost.

The best solution would be user friendly and would allow me to select a file and then see every action that has happened to the file over the life span of the file.

Has anyone dealt with this type of solution before and what software did you use?
0
Comment
Question by:Paul S
  • 13
  • 5
  • 4
  • +3
27 Comments
 
LVL 13

Expert Comment

by:SagiEDoc
ID: 33426916
CommVault will be about the best solution at the moment. With dedup you would save a ton of disk space while still having a huge amount of disk data. You could do this yourself with file level loging and by taking incremental backups daily or whenever a file changes. But honestly there are so many points of failuire in this senario I would just use a proven CommVault solution.
0
 
LVL 20

Expert Comment

by:SelfGovern
ID: 33429757
What's the granularity on the changes you need to track?   Is it weekly, daily, hourly, or by minute?   And by how much is the data expected to change?  That is, if you have 300GB today, and tomorrow looked at just files that had been changed, how big would the changed files be?  To know of a solution, I need to understand the magnitude of changes.    Also, how much does the base data change in a year -- if 300GB now, what will the size of the data be in 12 months (just the files, without change tracking)?

You might not want to use tape.... but seriously, if you try to keep data on disk for archival purposes, you will be eaten out of house and home by the cost of electricity -- especially as the data grows.  

I'm inclined to suggest something like read-only files (that is, an individual file can't be modified, but it could be changed and saved to a new file), and a hierarchical storage system such as HP's StoreNext that will move less-used data to a cheaper tier of storage (slower disk and then -- yes -- tape) as the access pattern indicates -- but data is still available transparently as if it were on disk.

You might also want to do a search for "change management system" or "code management system" and see if these might work for what you do -- these systems are used to keep track of, for instance, large programming projects, where people check modules in and out, but all changes must be preserved.  I just don't know how easily they could be adapted to images.

But very important for you, before you start an implementation, is to model the change rate and the growth rate, and make sure you really have a long-term solution.
0
 
LVL 11

Author Comment

by:Paul S
ID: 33430808
I reviewed some of the data and I think about 4.6 GB per week is the rate of change.
0
 
LVL 20

Expert Comment

by:SelfGovern
ID: 33432442
That's about 1.5% change, which is pretty minimal.   Still, over a year, that 4.6GB will change 52 times, meaning at the end of 12 months, you'll have an additional 240GB to store, bringing you up to 550GB.  At the end of five years, you'll have over 1.5TB, which is manageable.    

Is this -- weekly -- the level of granularity you need to capture?  Or, do you need to track changes throughout the week (which almost certainly means that the weekly changed data would be much larger, because a file edited on Monday might also be changed on Tuesday and Thursday)?

But is the data also growing from addition of new content?   Many  businesses' data is growing at 50%/year or more.  This new data may also be the most likely to change (chance of a file loaded today being edited today or tomorrow is usually quite high; chance of a file loaded years ago and not touched for two years is quite low).

CommVault may well work for you -- but be sure to check out the Records Management functionality.    You will need something that is designed for content, code, or record management, or else you'll have a ton of data with no way to get to it.
0
 
LVL 16

Expert Comment

by:Gerald Connolly
ID: 33463262
For software to track file variants, have you considered a Change Management  System like Subversion?
On the storage front, the biggest issue about storing content for 100+ years is the constant change in technology, storing the content on what seems to be todays hot technology might not be around in 30 years let alone 100, and allied to that is the practical media retention time! (i.e. how long you can reliably expect to put a piece of media on the shelf and still get your data back!)
In todays world, storage buses are barely lasting 20 years - try finding a SCSI-1 HBA?, even DLT/LTO tapes are only good for 30 or so years and then if only stored in ideal conditions, so provision need to be made for technology refresh every 10 years or so and of course you need to make sure you don't keep all your eggs in one basket - so diversify as well!
0
 
LVL 11

Author Comment

by:Paul S
ID: 33469594
I know the technology will change a lot and my storage media will have to evolve as the industry changes.

I have not looked into subversion yet. I will do that.

The numbers I gave you are only changes not growth. Growth rate is unknown, but I would say 10% a year maybe?
0
 
LVL 11

Author Comment

by:Paul S
ID: 33470759
So subversion looks promising, but I do not think it is built for the use I need.

If it can monitor direct file system changes than I would be in good shape. It appears it will only monitor changes to files via the svn command (i.e. import, checkout, etc...).

Any ideas?
0
 
LVL 16

Expert Comment

by:Gerald Connolly
ID: 33472310
Well its a case of using a change management system and checking-out/checking-in files to record changes or better a journaled file-system that gives you access to its logging mechanisms.
Assuming you are a windows user try using this string in google "tracking file changes in windows" beware the first hit i got was for dirmonitor but the download link on that page took me to site that was reported as unsafe by my browser plus a comment that it was a trojan.
It appears that NTFS has the capability to do what you want, but invovles either writing your own app (not very easy) or finding one that someelse has developed and that is easy to setup and use.
0
 
LVL 11

Author Comment

by:Paul S
ID: 33480242
I thought about writing my own, but I won't be here 50 years from now and I wanted something that someone besides me could support easily.
0
 
LVL 11

Author Comment

by:Paul S
ID: 33480455
I have found a few pieces of software, but none of them save the file before allowing the change. They just record all the actions that have happened to a file.
0
 
LVL 16

Expert Comment

by:Gerald Connolly
ID: 33488133
So thats what a CMS does.
With a CMS you should be able to reconstruct any version of a file - whatever the content!
0
 
LVL 3

Expert Comment

by:dkikalis
ID: 33490885
We use IBM's Tivolli Storage Manager an do Incremental for ever technology.  The storage gourds can be on disk, tape or just about anything else.
0
 
LVL 20

Expert Comment

by:SelfGovern
ID: 33492104
TSM isn't designed to be a content management system, though.   Periodically, you must create a synthetic full backup, and over time I suspect you'd lose some granularity, even if you originally were performing hourly incremental backups.

100 years is a *long* time.
0
Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

 
LVL 11

Author Comment

by:Paul S
ID: 33502402
I have been searching google for some software that will do what I want. I am not finding anything.
0
 
LVL 20

Expert Comment

by:SelfGovern
ID: 33502807
If you want to be able to restore to any point, you need one of the content management or change management software packages.   Yes, it will mean that there is a change in your workflow, in that users will have to check documents out before they can change them.     But I'm not aware of a traditional "backup" application, or any other besides a content management system, that is designed to track changes at that level of granularity over a period of a year, let alone a decade.

Traditional backup applications will lt you restore to certain points, but they're designed to have less granularity over time, as a rule -- you might be able to restore to the day for a month, to the month for a year, etc., which I gather is not what you need.    And even if you set up hourly backups of changed files, there won't be any easy indication of who changed the file, or why it was changed, or what changes were made... and I'm not sure a backup application's database would be able to handle the number of transactions that would be required to keep track at the level you want over the time period you want.
0
 
LVL 11

Author Comment

by:Paul S
ID: 33506824
The problem is that I am already using a docuware management program. I have no control over the code. That is why I wanted something that works with the raw file system. Too bad the document system doesn't support change management now.
0
 
LVL 11

Author Comment

by:Paul S
ID: 33506830
I meant to say Document management program, but I said DocuWare which is the name of the document program we are using.

http://docuware.com/
0
 
LVL 11

Author Comment

by:Paul S
ID: 33678140
I have yet to find a good solution that meets my needs. Any more suggestions?
0
 
LVL 3

Expert Comment

by:dkikalis
ID: 33680893
This is really low tech but buy a huge disk and take an initial backup.  Write a script that is scheduled to look at each file in the source and file all changes and then copy then to the destination disk.  The destination could be put the files either renamed with the date in the filename or into a new directory for each file that has changed since the last scheduled run.  Depending of the file system this could be done with the archive bit.  The advantage of the is that as technology changes an capacity grows you can shift the destination to new media.  I have a system albeit not for 100 year of changes that does something similar.  If you have a good systems programmer they could latch onto the file systems journal to trigger the copy.  This is how TSM can do multi-terabyte file system backups in a few hours by only backing up the stuff that changes.  The database is a slightly different beast and I think you need to treat them differently. Your database needs to have auditing tables an the ability to recover records through a front end.  There is no file system tools that just take the changes and rolling back databases to get one record 100 years ago does not seem feasible.    
0
 
LVL 20

Expert Comment

by:SelfGovern
ID: 33682347
Shoot... you could do it without programming, just by using a tape drive and running incremental backups every fifteen minutes.  LTO-5 tapes hold 1.5TB native; your data may compress to give you 2TB, 3TB, or even more effective capacity.

When you fill a tape up, start another tape with a full backup, and then re-run your incremental-every-fifteen script.   The full tape goes on a shelf labeled with when it was first used and when it was last used, so that you know quickly which tape you need to recover a file modified on January 24, 2017.

Depending on how much data actually changes, you might be able to set up even more frequent backups -- maybe every five minutes or less?

This has the advantage of being simple (almost any backup application can do it without any coding), high-capacity, and that tapes are designed to sit on a shelf for decades without power (unlike disks, which are designed to run under power and will eventually lose data if unpowered).

Periodically -- every five or ten years?   You'll buy the latest, greatest tape drive, and have a step of copying all the old tapes to whatever is the newer tape technology at that point, probably decreasing the number of physical tapes by a factor of four or eight.
0
 
LVL 11

Author Comment

by:Paul S
ID: 33693955
Both of those solutions sound plausible. I will probably end up using something along the lines of one of them. A change management system seems to require too much modification to my current systems.
0
 
LVL 11

Author Comment

by:Paul S
ID: 33694347
I have also contacted this company: http://www.tgrmn.com/  to see if they have a solution. I will get back to everyone later.
0
 
LVL 11

Author Comment

by:Paul S
ID: 33729519
It looks like if I am willing to compromise on my needs then the ViceVersa  software will work. It is not in real time and depending on the speed of my server, I will only be able to check for changes a few times a day, but that may be sufficient.

Here is an email and the responses from their support people:

 > - Will all my data be duplicated once, before archiving starts working?

Yes

 > - Will the archive location and the replica location be at the same  > location? Can they be if I want?

Yes (but different folders, e.g. c:\folder\mirror\ and c:\folder\archive\

 > - Can I keep archive copies forever until I run out of disk space?

Tes

 > - How does the archive feature handle a folder or file that is renamed,  > but not actually modified?

File is move to the archive anyway

 > - Is 2.5 million file (about 300 GB) too much data to process multiple  > times a day?

In general, probably yes. But a lot depends on your disk/network performance.

 > - How does viewing of archived files work? Do I have to locate the files  > myself to view past versions or is there a front end?

There is an archice viewer tool in the ViceVersa tools menu

 > - When one archive location is full, can I start a new one while leaving  > the original intact?

Yes, change the archive folder, and ViceVersa will start again, leaving the previous archive folder intact.

 > - Can two profiles run at the same time if I want to keep archives for  > two separate locations?

yes
0
 
LVL 11

Accepted Solution

by:
Paul S earned 0 total points
ID: 34293230
I have decided to use ViceVersa and will close this question. Thank you all for your input and feedback.
0
 
LVL 16

Expert Comment

by:Gerald Connolly
ID: 34295584
Too much input from too many people to close with out awarding points (split)
0
 

Expert Comment

by:South Mod
ID: 34332926
All,
 
Following an 'Objection' by connollyg (at http://www.experts-exchange.com/Q_26663890.html) to the intended closure of this question, it has been reviewed by at least one Moderator and is being closed as recommended by the Expert.
 
At this point I am going to re-start the auto-close procedure.
 
Thank you,
 
SouthMod
Community Support Moderator
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Suggested Solutions

Create your own, high-performance VM backup appliance by installing NAKIVO Backup & Replication directly onto a Synology NAS!
A Bare Metal Image backup allows for the restore of an entire system to a similar or dissimilar hardware. They are highly useful for migrations and disaster recovery. Bare Metal Image backups support Full and Incremental backups. Differential backup…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…
This Micro Tutorial will teach you how to reformat your flash drive. Sometimes your flash drive may have issues carrying files so this will completely restore it to manufacturing settings. Make sure to backup all files before reformatting. This w…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now