chsalvia
asked on
Quadratic Probing
I'm trying to implement quadratic probing to resolve hash table collisions. Usually I use a linked-list method, but I want to try out quadratic probing.
My basic "get" function uses a loop like:
for (unsigned n = 1; table_space[idx] != EMPTY; ++n) {
if (memcmp( table_space[idx].key, key, strlen(key)) == 0)
return table_space[idx].value;
idx += ( (n+1)*(n+1) ) - (n*n);
if (idx >= Capacity) idx %= Capacity;
}
I'm not sure if I'm implementing the quadratic function properly. If the idx value exceeds the capacity of the table, I reset it to the beginning of the table with a modulo/assignment. However, I've seen some implementations which use the hash value itself for the quadratic function, rather than the table index. Is there any difference, and is the above implementation adequate?
My basic "get" function uses a loop like:
for (unsigned n = 1; table_space[idx] != EMPTY; ++n) {
if (memcmp( table_space[idx].key, key, strlen(key)) == 0)
return table_space[idx].value;
idx += ( (n+1)*(n+1) ) - (n*n);
if (idx >= Capacity) idx %= Capacity;
}
I'm not sure if I'm implementing the quadratic function properly. If the idx value exceeds the capacity of the table, I reset it to the beginning of the table with a modulo/assignment. However, I've seen some implementations which use the hash value itself for the quadratic function, rather than the table index. Is there any difference, and is the above implementation adequate?
ASKER
>>I'm assuming that idx is initialized prior to the for(;;) statement?
Yes.
>>For my money, I like the linked-list approach for dealing with overflow. It's easier to implement and means that a slightly less >>than optimal hash algorithm still works reasonably well.
I think so too. It also lets you fill up the table without too much performance degradation.
I've been experimenting with ways to get fast hash table performance with minimal memory allocation and the option for variable length records. In C++, hash tables usually allocate new objects on every insert, which involves a lot of sporadic memory allocation. An object is allocated to the heap, then its constructor is called, and maybe the object allocates some storage for some internal variables as well.
With C, I was thinking I could get a pretty fast hash table which would allow variable length records, by doing something like this: Making a table that is just an array of integers. Each integer would store the offset of a record stored in a larger heap space. The problem is that this requires two resize operations - one for the table, and one for the storage heap.
But I thought it would be a nice way to store variable length strings, or any kind of data, and be able to have a hash index to look it up quickly. Plus, barring the resize, there would be no memory allocation on inserts - just a quick assignment to the proper index, and a memcpy for the storage heap.
I was looking into quadratic probing because if I used a linked-list, the hash table would need to be more than just a simple array of integers - it would need to be an array of integers and next pointers.
Yes.
>>For my money, I like the linked-list approach for dealing with overflow. It's easier to implement and means that a slightly less >>than optimal hash algorithm still works reasonably well.
I think so too. It also lets you fill up the table without too much performance degradation.
I've been experimenting with ways to get fast hash table performance with minimal memory allocation and the option for variable length records. In C++, hash tables usually allocate new objects on every insert, which involves a lot of sporadic memory allocation. An object is allocated to the heap, then its constructor is called, and maybe the object allocates some storage for some internal variables as well.
With C, I was thinking I could get a pretty fast hash table which would allow variable length records, by doing something like this: Making a table that is just an array of integers. Each integer would store the offset of a record stored in a larger heap space. The problem is that this requires two resize operations - one for the table, and one for the storage heap.
But I thought it would be a nice way to store variable length strings, or any kind of data, and be able to have a hash index to look it up quickly. Plus, barring the resize, there would be no memory allocation on inserts - just a quick assignment to the proper index, and a memcpy for the storage heap.
I was looking into quadratic probing because if I used a linked-list, the hash table would need to be more than just a simple array of integers - it would need to be an array of integers and next pointers.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Hi ozo,
Yeah, and
if (idx >= Capacity) idx %= Capacity
is probably better written as
idx %= Capacity;
But the logic seems to be the initial issue.
Kent
Yeah, and
if (idx >= Capacity) idx %= Capacity
is probably better written as
idx %= Capacity;
But the logic seems to be the initial issue.
Kent
If the table is full, you will loop forever
ASKER
>>idx += ( (n+1)*(n+1) ) - (n*n);
>>That seems strange way to write
>>idx += 2*n + 1;
Yes - that's a lot simpler.
>> If the table is full, you will loop forever
There is a resize function that is triggered when the number of records reaches a certain load factor.
>>Yeah, and
>> if (idx >= Capacity) idx %= Capacity
>>is probably better written as
>> idx %= Capacity;
Yeah - the if statement is redundant there. I suppose the whole thing could be expressed in one line like:
idx = (idx + 2*n + 1) % Capacity;
>>That seems strange way to write
>>idx += 2*n + 1;
Yes - that's a lot simpler.
>> If the table is full, you will loop forever
There is a resize function that is triggered when the number of records reaches a certain load factor.
>>Yeah, and
>> if (idx >= Capacity) idx %= Capacity
>>is probably better written as
>> idx %= Capacity;
Yeah - the if statement is redundant there. I suppose the whole thing could be expressed in one line like:
idx = (idx + 2*n + 1) % Capacity;
Hi chsalvia,
I'm assuming that idx is initialized prior to the for(;;) statement?
For my money, I like the linked-list approach for dealing with overflow. It's easier to implement and means that a slightly less than optimal hash algorithm still works reasonably well.
What you're attempting requires a dead-on algorithm or quite a bit of extra space as an overflow record is moved to another location in the primary pages, often resulting in the misplacement of additional data.
But your code looks fine.
Kent