Archive all Changes to Files for 100 years

I have 310 GBs of TIFF and JPEG images and a 2 GB sql Database that I need to preserve. I need to make a full backup and then I need to record all changes (overwrite, delete, modify, add, etc...) and keep all revisions of a file forever (well only 100 years, but might as well be forever).

So, If a user deletes or overwrites a file, or changes a file and I don't notice for 20 years, I need to be able to view/recover the deleted/modified file. I would prefer a solution that does not use tape.

Dell / CommVault suggested a server with CommVault software and 4 TB's of disk space for $22,000. I am hoping for some suggested software that will help manage and accomplish my goal with a more reasonable cost.

The best solution would be user friendly and would allow me to select a file and then see every action that has happened to the file over the life span of the file.

Has anyone dealt with this type of solution before and what software did you use?
LVL 11
Paul SDesktop Support Manager / Network AdministratorAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Brett DanneyIT ArchitectCommented:
CommVault will be about the best solution at the moment. With dedup you would save a ton of disk space while still having a huge amount of disk data. You could do this yourself with file level loging and by taking incremental backups daily or whenever a file changes. But honestly there are so many points of failuire in this senario I would just use a proven CommVault solution.
Thomas RushCommented:
What's the granularity on the changes you need to track?   Is it weekly, daily, hourly, or by minute?   And by how much is the data expected to change?  That is, if you have 300GB today, and tomorrow looked at just files that had been changed, how big would the changed files be?  To know of a solution, I need to understand the magnitude of changes.    Also, how much does the base data change in a year -- if 300GB now, what will the size of the data be in 12 months (just the files, without change tracking)?

You might not want to use tape.... but seriously, if you try to keep data on disk for archival purposes, you will be eaten out of house and home by the cost of electricity -- especially as the data grows.  

I'm inclined to suggest something like read-only files (that is, an individual file can't be modified, but it could be changed and saved to a new file), and a hierarchical storage system such as HP's StoreNext that will move less-used data to a cheaper tier of storage (slower disk and then -- yes -- tape) as the access pattern indicates -- but data is still available transparently as if it were on disk.

You might also want to do a search for "change management system" or "code management system" and see if these might work for what you do -- these systems are used to keep track of, for instance, large programming projects, where people check modules in and out, but all changes must be preserved.  I just don't know how easily they could be adapted to images.

But very important for you, before you start an implementation, is to model the change rate and the growth rate, and make sure you really have a long-term solution.
Paul SDesktop Support Manager / Network AdministratorAuthor Commented:
I reviewed some of the data and I think about 4.6 GB per week is the rate of change.
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

Thomas RushCommented:
That's about 1.5% change, which is pretty minimal.   Still, over a year, that 4.6GB will change 52 times, meaning at the end of 12 months, you'll have an additional 240GB to store, bringing you up to 550GB.  At the end of five years, you'll have over 1.5TB, which is manageable.    

Is this -- weekly -- the level of granularity you need to capture?  Or, do you need to track changes throughout the week (which almost certainly means that the weekly changed data would be much larger, because a file edited on Monday might also be changed on Tuesday and Thursday)?

But is the data also growing from addition of new content?   Many  businesses' data is growing at 50%/year or more.  This new data may also be the most likely to change (chance of a file loaded today being edited today or tomorrow is usually quite high; chance of a file loaded years ago and not touched for two years is quite low).

CommVault may well work for you -- but be sure to check out the Records Management functionality.    You will need something that is designed for content, code, or record management, or else you'll have a ton of data with no way to get to it.
Gerald ConnollyCommented:
For software to track file variants, have you considered a Change Management  System like Subversion?
On the storage front, the biggest issue about storing content for 100+ years is the constant change in technology, storing the content on what seems to be todays hot technology might not be around in 30 years let alone 100, and allied to that is the practical media retention time! (i.e. how long you can reliably expect to put a piece of media on the shelf and still get your data back!)
In todays world, storage buses are barely lasting 20 years - try finding a SCSI-1 HBA?, even DLT/LTO tapes are only good for 30 or so years and then if only stored in ideal conditions, so provision need to be made for technology refresh every 10 years or so and of course you need to make sure you don't keep all your eggs in one basket - so diversify as well!
Paul SDesktop Support Manager / Network AdministratorAuthor Commented:
I know the technology will change a lot and my storage media will have to evolve as the industry changes.

I have not looked into subversion yet. I will do that.

The numbers I gave you are only changes not growth. Growth rate is unknown, but I would say 10% a year maybe?
Paul SDesktop Support Manager / Network AdministratorAuthor Commented:
So subversion looks promising, but I do not think it is built for the use I need.

If it can monitor direct file system changes than I would be in good shape. It appears it will only monitor changes to files via the svn command (i.e. import, checkout, etc...).

Any ideas?
Gerald ConnollyCommented:
Well its a case of using a change management system and checking-out/checking-in files to record changes or better a journaled file-system that gives you access to its logging mechanisms.
Assuming you are a windows user try using this string in google "tracking file changes in windows" beware the first hit i got was for dirmonitor but the download link on that page took me to site that was reported as unsafe by my browser plus a comment that it was a trojan.
It appears that NTFS has the capability to do what you want, but invovles either writing your own app (not very easy) or finding one that someelse has developed and that is easy to setup and use.
Paul SDesktop Support Manager / Network AdministratorAuthor Commented:
I thought about writing my own, but I won't be here 50 years from now and I wanted something that someone besides me could support easily.
Paul SDesktop Support Manager / Network AdministratorAuthor Commented:
I have found a few pieces of software, but none of them save the file before allowing the change. They just record all the actions that have happened to a file.
Gerald ConnollyCommented:
So thats what a CMS does.
With a CMS you should be able to reconstruct any version of a file - whatever the content!
We use IBM's Tivolli Storage Manager an do Incremental for ever technology.  The storage gourds can be on disk, tape or just about anything else.
Thomas RushCommented:
TSM isn't designed to be a content management system, though.   Periodically, you must create a synthetic full backup, and over time I suspect you'd lose some granularity, even if you originally were performing hourly incremental backups.

100 years is a *long* time.
Paul SDesktop Support Manager / Network AdministratorAuthor Commented:
I have been searching google for some software that will do what I want. I am not finding anything.
Thomas RushCommented:
If you want to be able to restore to any point, you need one of the content management or change management software packages.   Yes, it will mean that there is a change in your workflow, in that users will have to check documents out before they can change them.     But I'm not aware of a traditional "backup" application, or any other besides a content management system, that is designed to track changes at that level of granularity over a period of a year, let alone a decade.

Traditional backup applications will lt you restore to certain points, but they're designed to have less granularity over time, as a rule -- you might be able to restore to the day for a month, to the month for a year, etc., which I gather is not what you need.    And even if you set up hourly backups of changed files, there won't be any easy indication of who changed the file, or why it was changed, or what changes were made... and I'm not sure a backup application's database would be able to handle the number of transactions that would be required to keep track at the level you want over the time period you want.
Paul SDesktop Support Manager / Network AdministratorAuthor Commented:
The problem is that I am already using a docuware management program. I have no control over the code. That is why I wanted something that works with the raw file system. Too bad the document system doesn't support change management now.
Paul SDesktop Support Manager / Network AdministratorAuthor Commented:
I meant to say Document management program, but I said DocuWare which is the name of the document program we are using.
Paul SDesktop Support Manager / Network AdministratorAuthor Commented:
I have yet to find a good solution that meets my needs. Any more suggestions?
This is really low tech but buy a huge disk and take an initial backup.  Write a script that is scheduled to look at each file in the source and file all changes and then copy then to the destination disk.  The destination could be put the files either renamed with the date in the filename or into a new directory for each file that has changed since the last scheduled run.  Depending of the file system this could be done with the archive bit.  The advantage of the is that as technology changes an capacity grows you can shift the destination to new media.  I have a system albeit not for 100 year of changes that does something similar.  If you have a good systems programmer they could latch onto the file systems journal to trigger the copy.  This is how TSM can do multi-terabyte file system backups in a few hours by only backing up the stuff that changes.  The database is a slightly different beast and I think you need to treat them differently. Your database needs to have auditing tables an the ability to recover records through a front end.  There is no file system tools that just take the changes and rolling back databases to get one record 100 years ago does not seem feasible.    
Thomas RushCommented:
Shoot... you could do it without programming, just by using a tape drive and running incremental backups every fifteen minutes.  LTO-5 tapes hold 1.5TB native; your data may compress to give you 2TB, 3TB, or even more effective capacity.

When you fill a tape up, start another tape with a full backup, and then re-run your incremental-every-fifteen script.   The full tape goes on a shelf labeled with when it was first used and when it was last used, so that you know quickly which tape you need to recover a file modified on January 24, 2017.

Depending on how much data actually changes, you might be able to set up even more frequent backups -- maybe every five minutes or less?

This has the advantage of being simple (almost any backup application can do it without any coding), high-capacity, and that tapes are designed to sit on a shelf for decades without power (unlike disks, which are designed to run under power and will eventually lose data if unpowered).

Periodically -- every five or ten years?   You'll buy the latest, greatest tape drive, and have a step of copying all the old tapes to whatever is the newer tape technology at that point, probably decreasing the number of physical tapes by a factor of four or eight.
Paul SDesktop Support Manager / Network AdministratorAuthor Commented:
Both of those solutions sound plausible. I will probably end up using something along the lines of one of them. A change management system seems to require too much modification to my current systems.
Paul SDesktop Support Manager / Network AdministratorAuthor Commented:
I have also contacted this company:  to see if they have a solution. I will get back to everyone later.
Paul SDesktop Support Manager / Network AdministratorAuthor Commented:
It looks like if I am willing to compromise on my needs then the ViceVersa  software will work. It is not in real time and depending on the speed of my server, I will only be able to check for changes a few times a day, but that may be sufficient.

Here is an email and the responses from their support people:

 > - Will all my data be duplicated once, before archiving starts working?


 > - Will the archive location and the replica location be at the same  > location? Can they be if I want?

Yes (but different folders, e.g. c:\folder\mirror\ and c:\folder\archive\

 > - Can I keep archive copies forever until I run out of disk space?


 > - How does the archive feature handle a folder or file that is renamed,  > but not actually modified?

File is move to the archive anyway

 > - Is 2.5 million file (about 300 GB) too much data to process multiple  > times a day?

In general, probably yes. But a lot depends on your disk/network performance.

 > - How does viewing of archived files work? Do I have to locate the files  > myself to view past versions or is there a front end?

There is an archice viewer tool in the ViceVersa tools menu

 > - When one archive location is full, can I start a new one while leaving  > the original intact?

Yes, change the archive folder, and ViceVersa will start again, leaving the previous archive folder intact.

 > - Can two profiles run at the same time if I want to keep archives for  > two separate locations?

Paul SDesktop Support Manager / Network AdministratorAuthor Commented:
I have decided to use ViceVersa and will close this question. Thank you all for your input and feedback.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Gerald ConnollyCommented:
Too much input from too many people to close with out awarding points (split)
South ModModeratorCommented:
Following an 'Objection' by connollyg (at to the intended closure of this question, it has been reviewed by at least one Moderator and is being closed as recommended by the Expert.
At this point I am going to re-start the auto-close procedure.
Thank you,
Community Support Moderator
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.