Solved

Quadratic Probing

Posted on 2006-11-17
7
726 Views
Last Modified: 2010-04-15
I'm trying to implement quadratic probing to resolve hash table collisions.  Usually I use a linked-list method, but I want to try out quadratic probing.

My basic "get" function uses a loop like:

            for (unsigned n = 1; table_space[idx] != EMPTY; ++n) {
                  if (memcmp( table_space[idx].key, key, strlen(key)) == 0)
                        return table_space[idx].value;
                  idx += ( (n+1)*(n+1) ) - (n*n);
                  if (idx >= Capacity) idx %= Capacity;
            }

I'm not sure if I'm implementing the quadratic function properly.  If the idx value exceeds the capacity of the table, I reset it to the beginning of the table with a modulo/assignment.  However, I've seen some implementations which use the hash value itself for the quadratic function, rather than the table index.  Is there any difference, and is the above implementation adequate?
0
Comment
Question by:chsalvia
  • 3
  • 2
  • 2
7 Comments
 
LVL 45

Expert Comment

by:Kdo
ID: 17965032

Hi chsalvia,

I'm assuming that idx is initialized prior to the for(;;) statement?

For my money, I like the linked-list approach for dealing with overflow.  It's easier to implement and means that a slightly less than optimal hash algorithm still works reasonably well.

What you're attempting requires a dead-on algorithm or quite a bit of extra space as an overflow record is moved to another location in the primary pages, often resulting in the misplacement of additional data.

But your code looks fine.

Kent
0
 

Author Comment

by:chsalvia
ID: 17965373
>>I'm assuming that idx is initialized prior to the for(;;) statement?

Yes.

>>For my money, I like the linked-list approach for dealing with overflow.  It's easier to implement and means that a slightly less >>than optimal hash algorithm still works reasonably well.

I think so too.  It also lets you fill up the table without too much performance degradation.

I've been experimenting with ways to get fast hash table performance with minimal memory allocation and the option for variable length records.  In C++, hash tables usually allocate new objects on every insert, which involves a lot of sporadic memory allocation.  An object is allocated to the heap, then its constructor is called, and maybe the object allocates some storage for some internal variables as well.  

With C, I was thinking I could get a pretty fast hash table which would allow variable length records, by doing something like this:  Making a table that is just an array of integers.  Each integer would store the offset of a record stored in a larger heap space.  The problem is that this requires two resize operations - one for the table, and one for the storage heap.  

But I thought it would be a nice way to store variable length strings, or any kind of data, and be able to have a hash index to look it up quickly.  Plus, barring the resize, there would be no memory allocation on inserts - just a quick assignment to the proper index, and a memcpy for the storage heap.

I was looking into quadratic probing because if I used a linked-list, the hash table would need to be more than just a simple array of integers - it would need to be an array of integers and next pointers.  
0
 
LVL 45

Accepted Solution

by:
Kdo earned 200 total points
ID: 17965538

>> if I used a linked-list, the hash table would need to be more than just a simple array of integers

All kinds of hybrid solutions are out there.

typedef struct
{
   char Key[20];   // or whatever
   int    Offset;     // Index into the list
} key_t;

typedef struct
{
  char Key[10];
//  rest of data
} mystruct_t;

typedef struct
{
  key_t   key[20];
  page_t *overflow;
  char    pad[x];
} page_t;


Use the definitions above to manage the keys, then keep the actual data as separate structs.  The hash table winds up being relatively small with very fast access to keys and/or data.


Kent
0
Are your AD admin tools letting you down?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

 
LVL 84

Assisted Solution

by:ozo
ozo earned 50 total points
ID: 17968894
idx += ( (n+1)*(n+1) ) - (n*n);
That seems strange way to write
 idx += 2*n + 1;
0
 
LVL 45

Expert Comment

by:Kdo
ID: 17970066
Hi ozo,

Yeah, and

   if (idx >= Capacity) idx %= Capacity

is probably better written as

   idx %= Capacity;


But the logic seems to be the initial issue.
Kent
0
 
LVL 84

Expert Comment

by:ozo
ID: 17970122
If the table is full, you will loop forever
0
 

Author Comment

by:chsalvia
ID: 17970622
>>idx += ( (n+1)*(n+1) ) - (n*n);
>>That seems strange way to write
>>idx += 2*n + 1;

Yes - that's a lot simpler.  

>> If the table is full, you will loop forever

There is a resize function that is triggered when the number of records reaches a certain load factor.

>>Yeah, and
>>   if (idx >= Capacity) idx %= Capacity
>>is probably better written as
>>  idx %= Capacity;

Yeah - the if statement is redundant there.  I suppose the whole thing could be expressed in one line like:

idx = (idx + 2*n + 1) % Capacity;
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Convert image to byte array 8 190
How to organize data in excel ? 2 112
cURL: stopping a http transaction before it's finished 3 116
Socket Programming (Unix) 8 119
An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
Summary: This tutorial covers some basics of pointer, pointer arithmetic and function pointer. What is a pointer: A pointer is a variable which holds an address. This address might be address of another variable/address of devices/address of fu…
The goal of this video is to provide viewers with basic examples to understand and use pointers in the C programming language.
The goal of this video is to provide viewers with basic examples to understand and use conditional statements in the C programming language.

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now