Solved

What is the probability of two different strings having the same MD5 hash?

Posted on 2009-05-13
4
643 Views
Last Modified: 2012-05-06
Hi,

I am planning to create my own query cache. I will be looking up the query results from the cache using an MD5 hash of the query string. There will be tens of thousands of queries stored in the cache. What is the likelihood of two different query strings having the same MD5 hash?

The reason I ask is because to my knowledge MySQL's native query cache looks up cached results using the full query string, instead of using a hash of the query string (which would be faster and use less storage), and I wondered if this was due to potential hash conflicts.

Thanks
0
Comment
Question by:tomp_gl
  • 2
4 Comments
 
LVL 14

Assisted Solution

by:racek
racek earned 50 total points
ID: 24373331
No problem, but before MD5 you need to
- replace all double spaces
- change all to capitals
- replace variables with ? or similar
- maybe replace ALIAS table and column names with whole table names :-) because different programmers use different aliases
- LEF JOIN to LEFT OUTER JOIN because different programmers use different aliases
etc

0
 
LVL 84

Accepted Solution

by:
ozo earned 150 total points
ID: 24373336
with 10,000 querys, approximately 10,000^2 / 2^128
0
 
LVL 22

Assisted Solution

by:dportas
dportas earned 50 total points
ID: 24373796
The risk of accidental MD5 collisions is vanishingly small. You don't need to worry about it. There is a possible risk of deliberately constructed collisions, which may present a security risk in some unusual circumstances.

As racek suggests, for your scheme to be effective you'll probably have to use some canonical form of the query rather than its raw format.
0
 
LVL 14

Assisted Solution

by:racek
racek earned 50 total points
ID: 24374830
another thing is that query is stored in MySQL including comments like

SELECT *
FROM yourtable  /* changed 2009-01-05 */
Where ...;
0

Featured Post

VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Cybersecurity has become the buzzword of recent years and years to come. The inventions of cloud infrastructure and the Internet of Things has made us question our online safety. Let us explore how cloud- enabled cybersecurity can help us with our b…
Load balancing is the method of dividing the total amount of work performed by one computer between two or more computers. Its aim is to get more work done in the same amount of time, ensuring that all the users get served faster.
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…
The Email Laundry PDF encryption service allows companies to send confidential encrypted  emails to anybody. The PDF document can also contain attachments that are embedded in the encrypted PDF. The password is randomly generated by The Email Laundr…

774 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question