[Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 780
  • Last Modified:

Hashtable structure

I'm slightly confused as to one aspect of how hashtables work.  I understand that a hashtable is really just an array of key values, which can be turned into numbers using a hash function, which then allows you to map those values to data values.

But, I'm not sure if a hashtable is supposed to be created using one or two arrays.  I would think you'd need two arrays to create a hashtable - one for key values and one for data values.  

Is this correct?  Some websites seem to imply there is only one array, which holds the data values, which are simply located at the hash index of a list of predetermined key values.  

I'd think though, that when you perform a lookup, it would need to hash the lookup keyword, then check the keyword array for that index value, and see if the key matches.  THEN, it would look up the same hash index in the data array, and return that value.  

Right?
0
chsalvia
Asked:
chsalvia
  • 2
3 Solutions
 
AlexFMCommented:
Hashtable usually contains array of structures. First element of structure is key, second is data. Keeping two arrays is also possible, but I think structure is better.
Keeping only data values is incorrect, because in this case key values are lost. Consider situation when key is string, and you need to enumerate all hashtable members, both keys and values - this will not work.
You description of lookup operation is right except last step. Data is kept together with key.
0
 
itsmeandnobodyelseCommented:
There are a few possibilities to implement a hashtable with different advantages, disadvantages.

A: Hold one (big) array with the data only.

A lookup would calculate the hashindex and return data at index if any. You can get subsequent (filled) positions if any and have to decide yourself whether these (and the very first return as well) would be a valid result to your query.  To insert a new entry you would calculate the hashindex and take the next empty slot beginning at the index position. Normally, you would/could use that method only if data contains key. The array msut be big enough to hold all data. It is going to become very inefficient if it is filled more than 50 percent.

B: Hold one array with pointers to linked lists.

A lookup would calculate the hashindex and return pointer to (first node of) linked list at index if any. If the data do not contain key there is no way to verify the results. To insert a new entry either a new linked list will be created at index position or the new entry was added to the list already available. Here the initial hashtable doesn't need to be sized very big. If the data contain the key this method is very efficient provided the hash algorithm is efficient as well. However, the method isn't recommendable for persistent storage cause it needs dynamic allocation.

C: You could use any other container instead of linked list or a (big) second array where the linked lists hold the pointers to.

D: If the data don't contain key you should store a pair of key and data (for each variant).

Regards, Alex




0
 
Infinity08Commented:
>> Keeping only data values is incorrect, because in this case key values are lost.
You can create just a data array, and still call it a hash table ... actually, that is a common way it's done. The key value is not always needed when retrieving an entry from the hash table ... so why waste space in storing it if it's not needed ?

This is exactly the definition of a hash table : the key is mapped to an array position by the hash function, so :

  int hash_function(string key);

will return the index into the hash table based on the key. The hash table is defined like this eg. :

  int hash_table[SIZE];

and you can do a lookup like this :

  string key;
  int data = hash_table[hash_function(key)];

As you see, the hash table just contains the data ... the key is only stored in case you need to retrieve the current key stored at a certain index (not all applications of a hash table need this).

If the key is needed, the hash_table could be defined like this :

  struct hash_entry { string key; int data; };
  struct hash_entry hash_table[SIZE];

with this lookup :

  string keyl;
  string key = hash_table[hash_function(keyl)].key;
  int data = hash_table[hash_function(keyl)].data;


This wiki explains some more :

http://en.wikipedia.org/wiki/Hash_table
0
 
itsmeandnobodyelseCommented:
An alternative to a hashtable is a skiplist.

A skiplist works with sorted linked lists what has the advantage that you can get a sorted list of all items or iterate through all items in order (regarding to the key). Furthermore, a skiplist automatically is balancing all entries which makes it nearly as fast as a balanced binary tree but has no costs for balancing (cause that is done atomatically).

A skiplist adds all entries to a sorted linked list. The node of any fourth entry (randomly calculated) was added to a second linked list, any 16th to a third list and so on:

3                                                                                           L
2                        D                                                                 L                                                                 S
1              B        D                F                                                L                          Q                                    S           X          Z
0       A     B        D                F      G                 H                     L          M   N         Q                    R              S    U     X    Y    Z

Here level 0 holds all entries and level 3 only one. For lookup you start at the top level  and go down and right if the key to search is greater than key in the list. E. g. looking up for 'O' would start at 'L', go down to level 0, right to 'M', right to 'N'. Then, either the key was found or the new insert position. For each new entry it will be determined randomly  to which lists it belongs. The maximum level is random as well.

Regards, Alex
0

Featured Post

Prep for the ITIL® Foundation Certification Exam

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now