[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

Probability of different files with same checksum

Posted on 2011-03-15
5
Medium Priority
?
907 Views
Last Modified: 2013-12-01
What is the probability that two files with the same extension, file size and checksum are actually different?
0
Comment
Question by:hankknight
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
5 Comments
 
LVL 74

Assisted Solution

by:sdstuber
sdstuber earned 500 total points
ID: 35137667
depends on the data and the checksum algorithm

for example

sum all bytes mod 2

half of all files will report the same check sum, the other half will report the other check sum
0
 
LVL 8

Accepted Solution

by:
point_pleasant earned 500 total points
ID: 35137711
for MD5 checksums

MD5 checksums are 128 bits wide (typically expressed as a sequence of 32 hexadecimal characters). So there are 2 to the 128th = 340282366920938463463374607431768211456 possible checksums.

in your situation the probability is for all purposes zero.
0
 
LVL 32

Assisted Solution

by:phoffric
phoffric earned 500 total points
ID: 35137766
   http://en.wikipedia.org/wiki/Checksum#Parity_byte_or_parity_word

Parity byte or parity word

"The simplest checksum algorithm is the so-called longitudinal parity check, which breaks the data into "words" with a fixed number n of bits, and then computes the exclusive or of all those words. The result is appended to the message as an extra word. To check the integrity of a message, the receiver computes the exclusive or of all its words, including the checksum; if the result is not a word with n zeros, the receiver knows that a transmission error occurred."

"With this checksum, any transmission error that flips a single bit of the message, or an odd number of bits, will be detected as an incorrect checksum. However, an error that affects two bits will not be detected if those bits lie at the same position in two distinct words. If the affected bits are independently chosen at random, the probability of a two-bit error being undetected is 1/n."

 -- fast check, but not reliable.
0
 
LVL 27

Assisted Solution

by:d-glitch
d-glitch earned 500 total points
ID: 35137792
I agree with sdstuber that the answer depends on the checksum algorithm,
but the sum of the bytes mod2 is not a likely candidate.

I agree with point pleasant that for practical purposes, a 128 byte checksum
will give approx zero chance of random collision.  But not all checksums are
128 bits wide.

You may also have to consider the chance that the files are not random.  It is
possible to construct a file that does have the same size and checksum of any
target file.  The larger the checksum, the more difficult it is.  Some cryptographic
attacks rely on this technique.

What are you relying on the checksum for?  Is there any incentive for malice?
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 35137859
>>>  sum of the bytes mod2 is not a likely candidate.


agreed,  extreme example was chosen simply for illustration
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article we will learn how to backup a VMware farm using Nakivo Backup & Replication. In this tutorial we will install the software on a Windows 2012 R2 Server.
Microsoft will be releasing the Windows 10 Creators Update in just a matter of weeks. Are you prepared? Follow these steps to ensure everything goes smoothly and you don't lose valuable data on your PC.
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question