Incremental Backup

How can i perform incremental Backup of files? Some say that I can use CRC or simply read the file by blocks and then compare them since the last full Backup??


Which one is Easier? Is there a Libary?? Please help :)
LVL 8
databoksAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

r3nderCommented:
Do you really want incremental backups - as that can get quite complex. Personally I would just replace version 1 with 2 or compare the file size - heres some direction (below) but if you really want to do it the other way by comparing text
Here is a whole project dedicated to it
http://www.codeproject.com/KB/recipes/DiffAlgorithmCS.aspx
and another
http://www.codeproject.com/KB/recipes/diffengine.aspx

using System;
using System.IO;

class Program
{
    static void Main()
    {
	// The name of the file
	const string fileName = "test.txt";

	// Create new FileInfo object and get the Length.
	FileInfo f = new FileInfo(fileName);
	long s1 = f.Length;

	// Change something with the file. Just for giggles.
	File.AppendAllText(fileName, " More characters.");

	// Create another FileInfo object and get the Length.
	FileInfo f2 = new FileInfo(fileName);
	long s2 = f2.Length;

	// Print out the length of the file before and after.
	Console.WriteLine("Before and after: " + s1.ToString() +
	    " " + s2.ToString());

	// Get the difference between the two sizes.
	long change = s2 - s1;
	Console.WriteLine("Size increase: " + change.ToString());
    }
}

=== Program output ===

Before and after: 3 20
Size increase: 17

Open in new window

0
databoksAuthor Commented:
It's not only on text files. But everykind of files.
0
TommySzalapskiCommented:
In many cases, incremental backups are very useful. A lot of software that has incremental backup though, does a full update on every file that has changed and just don't update files that haven't changed since the last update. I think some use timestamps on the file, but most use some kind of hash (CRC, MD5, etc).

To do incremental backup of an indivudual file, you could get as complex as you wanted. If you look into the research done on compression, you'll get tons of ideas.
I would just use a hash to see if it changed at all, and then check blocks to see if it matches for the first portion of the file. If not, I'd just update the whole thing.
But, if you look into research on partial string matching (works on binary data too, not just text strings), then you will get some good ideas on how to find matching portions.

Basically, high level incremental backups are almost identical to compression algorithms, so some research in that field would be of use.
0
OWASP Proactive Controls

Learn the most important control and control categories that every architect and developer should include in their projects.

databoksAuthor Commented:
I have looked on crc.. I Will do some research on compression..

I thought about to read the files in blocks and then compare them, but that will not always help, since the changes still be in the same bytes. For example a user has replaced a line with another line and it has the same length but not the same char.

I could use both crc and read the file blocks.

Any recommendation?
0
TommySzalapskiCommented:
You could check the files for long common substrings to see where they are the same. There are many algorithms out there for this. This would also help in the case that the change was longer than the original.
0
TommySzalapskiCommented:
What you need to decide is where in the balance between processor time and data size you want to end up. If you make a complicated incremental algorithm using partial string matching or long common substrings then you will have little data to transmit, but it will take a long time to compute.

If you are doing offsite backup with dialup on a >3GHz computer to a small backup drive, then shrink it down as much as possible.
If you are backing up to a local 1 TB drive, then maybe you want something simpler and faster. Just update the whole file for any that changed.
If you are trying to create a backup program to market and sell, then do both and let the user set a parameter to choose which to use.
0
databoksAuthor Commented:
Okay.

It has to be kind of professional so i cant just replace the files. I have to create deltas.


Imagine a customer who has 1000 textfiles. If the backup is scheduled to run everyday then i have to open 1000 text files and compute the substrings for the day before - this will take to  long time.


Maybe i have to base it upon the bytes of blocks and the same time CRC.


0
TommySzalapskiCommented:
Yes, you could compute checksums on blocks if the file checksums didn't match.
You could also loop through both files once and when they become different, keep moving through the larger one until they start matching again. In this way you could catch all simple insertions OR deletions. If a file had both, you would need the long complex versions that you don't want so in that case, you would just replace.
That would be more effective than doing it in blocks and take about the same amount of time.
0
databoksAuthor Commented:

thanks for reply.

so i first have to use CRC and then afterwards use blocks and see if the checksum is has changed.

can you please help me with some code?


Maybe with the CRC. and file read by blocks. Bear in mind that i will use Microsoft Volume Shadow Copy to take backup of these files.

0
TommySzalapskiCommented:
CRC is a type of checksum. You could use any kind of checksum you wanted, but since you have both files right there, why use a checksum? Why not just compare the blocks? Checksums are for comparing later to verify that there was no transmission error. Why go through and compute the checksum for both files when you can just compare both files?
0
TommySzalapskiCommented:
The code would just look something like
open(file1)
open(file2)

for i from 0 to blocksize
  if(file1[i ] != file2[ i])
    block is different!
0
databoksAuthor Commented:
You are right.

I only stick to the Blocksize comparing. however i have to create delta's later on, so that is that time, and that headache :)

I will try some code and post it here.

Thanks, Tommy.
0
TommySzalapskiCommented:
You just need to come up with a format for your deltas. You could just have a bitmask that says which blocks are changed, but again, if the file sizes can change, your deltas should list the inserts and deletes.
Something like insert(location, size, data) delete(location, size). Then you would store the original file and the inserts and deletes.
0
databoksAuthor Commented:
Excellent, Tommy. Promise to get back to you with a sample on the blocks.

Thank you in advance.
0
databoksAuthor Commented:
Hi tommy.

this is my code below..  i have to trim a bit and then make it more advanced.

what do you think of the code? the principle in detecting the blocksize changes.

this works on every kind of file.
static void Main(string[] args)
        {





            byte[] bytefile = File.ReadAllBytes("D:\\licens.txt");

            


            byte[] bytefile2 = File.ReadAllBytes("D:\\licens2.txt");

            for (int i = 0; i < bytefile.Length; i++)
            {
                if (bytefile.Length != bytefile2.Length)
                {
            
                
                }
                else if (bytefile.Length == bytefile2.Length)
                {
                    //No difference in the bytes; try look in the blocks.
                    if (bytefile[i] != bytefile2[i])
                    {
                        Console.WriteLine("BlockSize change deteced at blocksize" + i);
                    }
                    else
                    {
                        Console.WriteLine("No changes to the file..");
                    }

                }
                else
                { 
                
                }

            
            }

            Console.ReadLine();

        }

Open in new window

0
TommySzalapskiCommented:
This code is just comparing the lengths of the files over and over again. You need something more like:
            for (int i = 0; i < min(bytefile.Length, bytefile2.Length); i++) //Only check if both are long enough
            {
                if (bytefile[i] != bytefile2[i])
                {
                     //Change detected at i
                     //dump the value of i and the data from bytefile2[i] to bytefile2[bytefile2.Length] into the delta file
                    break; //leave the loop

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
databoksAuthor Commented:
Thank you.

I will try it. Even though my code worked in differnt scenarios. :)
0
databoksAuthor Commented:
Perfect ;)
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C#

From novice to tech pro — start learning today.