We help IT Professionals succeed at work.

How does this hash function work?

shampouya
shampouya asked
on
I'm confused by line 5 of the code below. My textbook says "it is computing a polynomial function of 37 by use of Honer's rule," which I don't understand. It seems like it is multiplying the hash value in the last iteration of the loop by 37 and adding the integer representation of the current character being examined. Why is it doing that?

I understand line 8, and the if statement in line 9 is adding the tableSize back to the hash value if the hash value is less than 0. But how would the hash value end up being less than 0 in the first place?
public static int hash3(String key, int tableSize) {
        int hashVal = 0;

        for (int i = 0; i < key.length(); i++) {
            hashVal = 37 * hashVal + key.charAt(i);
        }

        hashVal %= tableSize;
        if (hashVal < 0) {
            hashVal += tableSize;
        }

        return hashVal;
    }

Open in new window

Comment
Watch Question

Awarded 2010
Top Expert 2013
Commented:
I assume we are now using letters, digits, and spaces so that we have 37 characters?
By multiplying by 37 each time, it does the exact thing we were discussing in the other question except you don't need to know the size ahead of time. For a three letter string, the 37 will get multiplied twice for the first character and once for the second (and none for the last) so you have the same thing as before
first_char*37*37 + second_char*37 + last_char.

The mod operator should never return a negative, especially if you give it two positives, so I don't see the reason for that check either.

Author

Commented:
Yeah I guess that's what the 37 means. My book is so awfully written that I don't know. I see what you mean though, this algorithm  makes sure that the smallest 3-letter input is bigger than the biggest 2-letter input, and the smallest 4-letter input is bigger than the biggest 3-letter input, and so on and so forth. Thanks sir.

Author

Commented:
One more thing regarding this, why doesn't line 5 say this:

hashVal = hashVal + 37 * key.charAt(i);

instead of this:

hashVal = 37 * hashVal + key.charAt(i);
ozo
Most Valuable Expert 2014
Top Expert 2015

Commented:
A sensible %= tableSize operator would always return a value with the same sign as tablesize, but
c defines / to truncate toward 0, so it can return a negative value if hashVal is negative, which could happen if the value overflows an int.
Awarded 2010
Top Expert 2013

Commented:
>>why doesn't line 5 say this?

If it did it that way, then each character would just be multiplied by 37 once. The first character gets multiplied by 37 every time so it ends up multiplied by 37^(N-1) where N is the string length.
In this one, the farthest left character does get the highest value (like I wanted it to in the last question).

Ozo, that's true, but I wouldn't think tableSize should be allowed to be negative and overflow would cause more problems than that. Making all the ints unsigned would seem to me to be a better way to handle it.