Solved
weinberger hash algorithm flawed?
Posted on 2007-07-25
Hi
the hash algorithm designed by P.J.Weinberger is in widespread use. I've been trying to track down some discussion however of why it only returns 24 significant bits out of 32. I can't find anything on google.
I've tested about 5 different variations of the coded algorithm, all producing the same results (not surprisingly because of the stage in the loop that clears the top nybble in the hash value if it contains any bits)
for illustration, I've included one version of the code I tested (far from optimised) below.
unsigned int PJWHash(const std::string& str)
{
unsigned int BitsInUnsignedInt = (unsigned int)(sizeof(unsigned int) * 8);
unsigned int ThreeQuarters = (unsigned int)((BitsInUnsignedInt * 3) / 4);
unsigned int OneEighth = (unsigned int)(BitsInUnsignedInt / 8);
unsigned int HighBits = (unsigned int)(0xFFFFFFFF) << (BitsInUnsignedInt - OneEighth);
unsigned int hash = 0;
unsigned int test = 0;
for(std::size_t i = 0; i < str.length(); i++)
{
hash = (hash << OneEighth) + str[i];
if((test = hash & HighBits) != 0)
{
hash = (( hash ^ (test >> ThreeQuarters)) & (~HighBits));
}
}
return hash;
}
So my questions.
1. Am I correct in assuming that the hash function increases the chance of hash collision by (IMHO) needlessly decreasing the spread of allowable hash values? I.e. 2^28 instead of 2^32 for a 32-bit system.
2. Is this even a problem?