MD5 checksums are 128 bits wide (typically expressed as a sequence of 32 hexadecimal characters). So there are 2 to the 128th = 340282366920938463463374607431768211456 possible checksums.
in your situation the probability is for all purposes zero.
"The simplest checksum algorithm is the so-called longitudinal parity check, which breaks the data into "words" with a fixed number n of bits, and then computes the exclusive or of all those words. The result is appended to the message as an extra word. To check the integrity of a message, the receiver computes the exclusive or of all its words, including the checksum; if the result is not a word with n zeros, the receiver knows that a transmission error occurred."
"With this checksum, any transmission error that flips a single bit of the message, or an odd number of bits, will be detected as an incorrect checksum. However, an error that affects two bits will not be detected if those bits lie at the same position in two distinct words. If the affected bits are independently chosen at random, the probability of a two-bit error being undetected is 1/n."
I agree with sdstuber that the answer depends on the checksum algorithm,
but the sum of the bytes mod2 is not a likely candidate.
I agree with point pleasant that for practical purposes, a 128 byte checksum
will give approx zero chance of random collision. But not all checksums are
128 bits wide.
You may also have to consider the chance that the files are not random. It is
possible to construct a file that does have the same size and checksum of any
target file. The larger the checksum, the more difficult it is. Some cryptographic
attacks rely on this technique.
What are you relying on the checksum for? Is there any incentive for malice?
>>> sum of the bytes mod2 is not a likely candidate.
agreed, extreme example was chosen simply for illustration
0
Featured Post
Free tool – Veeam Explorer for Microsoft SharePoint, enables fast, easy restores of SharePoint sites, documents, libraries and lists — all with no agents to manage and no additional licenses to buy.
for example
sum all bytes mod 2
half of all files will report the same check sum, the other half will report the other check sum