Hashtable structure

Posted on 2006-03-23
Last Modified: 2012-06-21
I'm slightly confused as to one aspect of how hashtables work.  I understand that a hashtable is really just an array of key values, which can be turned into numbers using a hash function, which then allows you to map those values to data values.

But, I'm not sure if a hashtable is supposed to be created using one or two arrays.  I would think you'd need two arrays to create a hashtable - one for key values and one for data values.  

Is this correct?  Some websites seem to imply there is only one array, which holds the data values, which are simply located at the hash index of a list of predetermined key values.  

I'd think though, that when you perform a lookup, it would need to hash the lookup keyword, then check the keyword array for that index value, and see if the key matches.  THEN, it would look up the same hash index in the data array, and return that value.  

Question by:chsalvia
    LVL 48

    Assisted Solution

    Hashtable usually contains array of structures. First element of structure is key, second is data. Keeping two arrays is also possible, but I think structure is better.
    Keeping only data values is incorrect, because in this case key values are lost. Consider situation when key is string, and you need to enumerate all hashtable members, both keys and values - this will not work.
    You description of lookup operation is right except last step. Data is kept together with key.
    LVL 39

    Accepted Solution

    There are a few possibilities to implement a hashtable with different advantages, disadvantages.

    A: Hold one (big) array with the data only.

    A lookup would calculate the hashindex and return data at index if any. You can get subsequent (filled) positions if any and have to decide yourself whether these (and the very first return as well) would be a valid result to your query.  To insert a new entry you would calculate the hashindex and take the next empty slot beginning at the index position. Normally, you would/could use that method only if data contains key. The array msut be big enough to hold all data. It is going to become very inefficient if it is filled more than 50 percent.

    B: Hold one array with pointers to linked lists.

    A lookup would calculate the hashindex and return pointer to (first node of) linked list at index if any. If the data do not contain key there is no way to verify the results. To insert a new entry either a new linked list will be created at index position or the new entry was added to the list already available. Here the initial hashtable doesn't need to be sized very big. If the data contain the key this method is very efficient provided the hash algorithm is efficient as well. However, the method isn't recommendable for persistent storage cause it needs dynamic allocation.

    C: You could use any other container instead of linked list or a (big) second array where the linked lists hold the pointers to.

    D: If the data don't contain key you should store a pair of key and data (for each variant).

    Regards, Alex

    LVL 53

    Assisted Solution

    >> Keeping only data values is incorrect, because in this case key values are lost.
    You can create just a data array, and still call it a hash table ... actually, that is a common way it's done. The key value is not always needed when retrieving an entry from the hash table ... so why waste space in storing it if it's not needed ?

    This is exactly the definition of a hash table : the key is mapped to an array position by the hash function, so :

      int hash_function(string key);

    will return the index into the hash table based on the key. The hash table is defined like this eg. :

      int hash_table[SIZE];

    and you can do a lookup like this :

      string key;
      int data = hash_table[hash_function(key)];

    As you see, the hash table just contains the data ... the key is only stored in case you need to retrieve the current key stored at a certain index (not all applications of a hash table need this).

    If the key is needed, the hash_table could be defined like this :

      struct hash_entry { string key; int data; };
      struct hash_entry hash_table[SIZE];

    with this lookup :

      string keyl;
      string key = hash_table[hash_function(keyl)].key;
      int data = hash_table[hash_function(keyl)].data;

    This wiki explains some more :
    LVL 39

    Expert Comment

    An alternative to a hashtable is a skiplist.

    A skiplist works with sorted linked lists what has the advantage that you can get a sorted list of all items or iterate through all items in order (regarding to the key). Furthermore, a skiplist automatically is balancing all entries which makes it nearly as fast as a balanced binary tree but has no costs for balancing (cause that is done atomatically).

    A skiplist adds all entries to a sorted linked list. The node of any fourth entry (randomly calculated) was added to a second linked list, any 16th to a third list and so on:

    3                                                                                           L
    2                        D                                                                 L                                                                 S
    1              B        D                F                                                L                          Q                                    S           X          Z
    0       A     B        D                F      G                 H                     L          M   N         Q                    R              S    U     X    Y    Z

    Here level 0 holds all entries and level 3 only one. For lookup you start at the top level  and go down and right if the key to search is greater than key in the list. E. g. looking up for 'O' would start at 'L', go down to level 0, right to 'M', right to 'N'. Then, either the key was found or the new insert position. For each new entry it will be determined randomly  to which lists it belongs. The maximum level is random as well.

    Regards, Alex

    Featured Post

    Find Ransomware Secrets With All-Source Analysis

    Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

    Join & Write a Comment

    Templates For Beginners Or How To Encourage The Compiler To Work For You Introduction This tutorial is targeted at the reader who is, perhaps, familiar with the basics of C++ but would prefer a little slower introduction to the more ad…
    This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a G…
    The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
    The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

    745 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    17 Experts available now in Live!

    Get 1:1 Help Now