Using Two Hash Results to Confirm Changes to Source Data

A custom application requires the frequent transfer of large files over the wire.  After using file metadata to decide whether a given file has been modified and needs to be updated on the other side, I want to accomplish the actual transfer in, say, megabyte-sized fragments, but I also want to skip every fragment that the other end has already seen during previous transfers in order to avoid the bandwidth usage and save time completing the operation.  Having both sides compare a known hash for each segment may be a starting point, but because I cannot know what the files actually are (video, binary executable images, photos, regular documents, emails, etc.) I also cannot guarantee that a particular segment of the file might have been modified but, improbably enough, yields the same hash value.  If I were to compare the results of two hash algorithms, would this be a silver bullet against false positives and ensure that no changes were indeed made to any particular file segment?  That is, could the theoretical change yielding the same result using an SHA256 hash result also result in an MD5 hash of the same changed data yielding the same MD5 result?  I realize that the 100% response is that it's still theoretically possible, but will this put me sufficiently close enough to infinity?
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.


Assuming the two hash algorithms are different, the chance of two false positives is the product of the false positives of the separate algorithms.

You might also do well by including the file size, either in addition or instead of one of the algorithms.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Are you trying to protect against random coincidences or provide some level of security against attacks.
Why not send the file name and hash first and then receive a list of the files that need sending.
fredzo1966Author Commented:
@d-glitch:  Just trying to prevent a false positive from preventing a file segment being sent when really the data had changed but yielded the same hash, resulting in an overall corrupt file.  The pipeline between hosts already implements the necessary security to ward off attacks.  I guess I was looking for confirmation of my hunch, and your first comment confirmed it for me.  Thanks!

@aikimark:  The decision to transfer a file has already been made by the time we get to the crux of my question; I'm talking about individual chunks of a large file that has already been tagged for transfer.  Thanks for the comment.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.