We help IT Professionals succeed at work.

Using Two Hash Results to Confirm Changes to Source Data

fredzo1966
fredzo1966 asked
on
A custom application requires the frequent transfer of large files over the wire.  After using file metadata to decide whether a given file has been modified and needs to be updated on the other side, I want to accomplish the actual transfer in, say, megabyte-sized fragments, but I also want to skip every fragment that the other end has already seen during previous transfers in order to avoid the bandwidth usage and save time completing the operation.  Having both sides compare a known hash for each segment may be a starting point, but because I cannot know what the files actually are (video, binary executable images, photos, regular documents, emails, etc.) I also cannot guarantee that a particular segment of the file might have been modified but, improbably enough, yields the same hash value.  If I were to compare the results of two hash algorithms, would this be a silver bullet against false positives and ensure that no changes were indeed made to any particular file segment?  That is, could the theoretical change yielding the same result using an SHA256 hash result also result in an MD5 hash of the same changed data yielding the same MD5 result?  I realize that the 100% response is that it's still theoretically possible, but will this put me sufficiently close enough to infinity?
Comment
Watch Question

Yes.

Assuming the two hash algorithms are different, the chance of two false positives is the product of the false positives of the separate algorithms.

You might also do well by including the file size, either in addition or instead of one of the algorithms.
Are you trying to protect against random coincidences or provide some level of security against attacks.
Top Expert 2014

Commented:
Why not send the file name and hash first and then receive a list of the files that need sending.

Author

Commented:
@d-glitch:  Just trying to prevent a false positive from preventing a file segment being sent when really the data had changed but yielded the same hash, resulting in an overall corrupt file.  The pipeline between hosts already implements the necessary security to ward off attacks.  I guess I was looking for confirmation of my hunch, and your first comment confirmed it for me.  Thanks!

@aikimark:  The decision to transfer a file has already been made by the time we get to the crux of my question; I'm talking about individual chunks of a large file that has already been tagged for transfer.  Thanks for the comment.