Detecting File Changes


Im working on developing an online backup to backup our desktops in the office and the director's home pc.

I'm trying to work out how to detect which files have been updated and without using loads of CPU and without using loads of bandwidth checking each file.

Any ideas how I can detect file changes quickly and easily? Is the File Modified date always correct or do I have to use a file hash to ensure the file has or hasnt changed?
Do I need to keep a local database of the files that have been backed up, or is it best to download a current list from the online server?

It will be developed in 2005.

Many thanks
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

You can try to use FileSystemWatcher to get it done. Using this component, you can get any change from your drive. Afterwards, you can take any action on during file/folder is created, moved, deleted and changed.
DanJournoAuthor Commented:
What if I want to schedule the backup to occur at a certain time each day, rather than constantly?

  If you want to make it easily, I suggest you after the program had been completed, use windows xp schedule task to schedule your backup program.


Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

Here's an example of how you can do this on the local system.  This uses the FileSystemWatcher, as tommy said, but shows you how to do it, and shows you how to use a Table to store the paths in so they all can be backed up at one time, a Timer to check for when to backup the data, and how to perform the back.

Now, you'll have to work on this to make it work for you, but it's a good start...

    'This file system watcher detects changes to files
    '  on the local system (created, modified, deleted, renamed)
    Private WithEvents Watcher As IO.FileSystemWatcher = Nothing

    'This table stores the list of changed files
    Private mTable As New DataTable()

    Private WithEvents mTimer As New System.Timers.Timer()

    Private Sub StartMonitoring()

        'Add columns to table
        If mTable.Columns.Count = 0 Then


        End If

        'Start watching path
        Dim pathToMonitor As String = "c:\temp"
        Dim fileExtenionFilter As String = "*.*"

        Watcher = New IO.FileSystemWatcher(pathToMonitor, fileExtenionFilter)
        Watcher.EnableRaisingEvents = True

        'Start Timer, interval set to 1 minute (60000 ms = 1 minute)
        mTimer.Interval = 60000

    End Sub

    Private Sub Watcher_Changed(ByVal sender As Object, _
                                ByVal e As System.IO.FileSystemEventArgs) _
                                Handles Watcher.Changed

        'When a file is changed, add a new record to the table
        mTable.Rows.Add(e.FullPath, e.ChangeType.ToString())

    End Sub

    Private Sub mTimer_Elapsed(ByVal sender As Object, _
                               ByVal e As System.Timers.ElapsedEventArgs) Handles mTimer.Elapsed

        'Use the Timer to check for the correct time to perform the backup
        Dim currentTime As DateTime = Now
        Dim backupHour As Integer = 17  '5:00 PM (24 hour clock)
        Dim backupMinute As Integer = 0 '0 = :00, 30 = :30, etc.
        Dim backupTime As DateTime = _
            New DateTime(Today.Year, Today.Month, Today.Day, backupHour, backupMinute, 0)

        'Get the number of minutes the timer is set to
        Dim interval As Integer = (mTimer.Interval / 1000) / 60

        'See if the current time (right now) is within the backup window
        '  (between the designated backup time and the interval of the timer
        If currentTime > backupTime AndAlso _
            currentTime < backupTime.AddMinutes(interval) Then

            'Perform backup
            Dim path As String = String.Empty
            Dim destination As String = "C:\Backup"

            For Each row As DataRow In mTable.Rows

                path = row.Item("FilePath").ToString()

                'Copies the file from the current location to the designated directory
                My.Computer.FileSystem.CopyFile(path, destination & "\" & _
                                                IO.Path.GetFileName(path), True)


        End If

    End Sub

DanJournoAuthor Commented:
The only catch with this is, if the monitor application isn't running and files are changed/created, they won't be added to the table and therefore wont be backed up later on.

Any other solution on detecting which files have changes? Is it simply comparing each file to an index showing a filesize and hash of the backed up file?

That doesn't mean you need to abandon the idea...  Just create a Windows Service and use that instead of a Windows Forms application.  The Windows Service will start as soon as the computer boots up, even before logon, and will always be running..............
DanJournoAuthor Commented:
What are my options if i'd prefer to scan all the files for changes, rather than creating a windows service to monitor changes?

Many thanks
Well, you have to (of course) scan the selected directory, and all subdirectories, for all files.  You will need to create a persisted storage file to store the list in (like XML, csv, txt, mdb, mdf, etc.).  You will need to make sure you capture the full file path of each file, and the modified date.

When you perform your scan, you will need to loop through each file in the selected directory, and all subdirectories, look for that file in the storage file, if it's not found, then add it, and back it up; if it's found, then check the date and time to see if it has been modified, and then if it has, perform your backup for that file, then update the record in the storage file with the new date.

You also need to check each path in your storage file against the current directory, and all subdirectories, to determine if a file has been removed (deleted) from the file system.  Because if it has been removed from the file system, then when you scan the directory, it will not be found, and you will need to remove it from the database.

That's a lot more complex than using a FileSystemWatcher.  But if that's the way you want to go, then you'll need to do some thinking...

The only reason the File Modified date wouldn't udpate when a file updates is if someone is intentionally moneying with that attribute on the files.  So the quick and easy solution would be to use the File Modified date.  If this will be the only program that is performing any backup of the files, then you could use the archive flag of the file, because whenever a file is modified the archive flag is set to true and you could set it to false as you backup the files.  This is how most backup programs typically track what needs to be backed up during a differential backup.

If you are worried about something toying with the attributes on the files such that neither of those options are good enough, then I would use those as 'first pass' indicators, use file size as the next pass, and then a hash of the file as a final check.  For the hash you can either keep a local list of the has codes of the files that you have saved to the remote location, or you could keep a database at the remote location of all the files that are there.

May I ask, is there a reason you are choosing not to use one of the many available backup programs that are out there?  (just curious)
DanJournoAuthor Commented:
Thanks Volox,

The director wants to try and see if we can reduce costs by running our own.

Do you have any tips or code on how to work out the file hash?

As a general rule, I would say that over time the total cost of building and maintaining such a utility will exceed what it would cost to build, test, and maintain your own.  And when it comes to something like backups, that's not where I'd want to be taking risks.  Of course, that's not your question or decision so I'll apologize for the tangent.

You find that there are .Net libraries for creating hash codes and rather than repeating what others have already written, I'll refer you to this article on how to calculate hash codes for files:  

If that doesn't give you the answers you need, post back here and we'll see what else we can do for you.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
.NET Programming

From novice to tech pro — start learning today.