?
Solved

Hash Table Improvement

Posted on 2009-02-21
11
Medium Priority
?
303 Views
Last Modified: 2012-05-06
Hi Experts,

There are two parts to my question:

A) I have implemented a hash table that has 26 keys (one for each letter of the alphabet) that will store words in the table/array. A collision list (i.e. overflow list) has been implemented as a single linked list such that if an incoming word into the hash table shares the same key as another word already in the table, then the words will be added to the collision list.

The hash function implemented is too simplistic to be used in practice since it distributes words nonuniformly across the table. I want to design a better hash function, one that will provide a more uniform distribution of the words over the total components of the array (the array size may change if needed) and which will give similar sized overflow lists.

B) I then wish to design and implement an experiment which shows that my hash function performs appropriately. The results of the experiment should:

- demonstrate the uniformity of the word distribution.
- provide information re the length (ie maximum, mimimum, average ) of the overflow lists and
the length (ie maximum, mimimum, average) of search paths to find a word or to identify its
absence.
- give the frequency distribution of the size of the possible overflow lists.

Any help on these two parts would be greatly appreciated. No code needed, just a **detailed ** algorithm.
Thanks,

Dennis
0
Comment
Question by:datopdogg7
11 Comments
 
LVL 92

Expert Comment

by:objects
ID: 23702944
why not just use String's hashCode() method (and mod it to fit your table size)
0
 
LVL 92

Expert Comment

by:objects
ID: 23702951
for the experiment, generate a large number of words (randomly or copy a large piece of text) and add them to your list
once done you can report the number of words allocated to each bucket

0
 

Author Comment

by:datopdogg7
ID: 23702952
Hey Objects,
I need to make my own...
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 92

Accepted Solution

by:
objects earned 750 total points
ID: 23702958
0
 

Author Comment

by:datopdogg7
ID: 23702993
Ok, I read the algorithm, but how will that guarantee a hash table that has uniform word distribution? Also, how would I know how to initalize the size of the hash table? Would the mapping keys represent an integer range?
Thanks,
Dennis
0
 
LVL 92

Expert Comment

by:objects
ID: 23703100
u can't guarantee uniform distibution without knowing exactly what words you are going to enter :)

0
 
LVL 92

Expert Comment

by:objects
ID: 23703194
have a read of the following

http://en.wikipedia.org/wiki/Hash_function

the best hash function to use depends on the data you are hashing

0
 
LVL 86

Assisted Solution

by:CEHJ
CEHJ earned 750 total points
ID: 23703892
There seems to be quite a bit of confusion here. The hashing algo has nothing to do with the words in the table - it's applied to the *keys*, which, in your case:

>>I have implemented a hash table that has 26 keys (one for each letter of the alphabet)

are each letter of the alphabet. Evenness is to do with where the keys are in the table. In the case of the Java API, the keys would be perfectly even, since the hashCode() of each would return the ascii value of each letter
0
 

Author Comment

by:datopdogg7
ID: 23705727
Hi CEHJ,

Yes I am aware of how the algorithm works. Attached is the code I am currently using.

I am still looking for a better way to distribute the incoming words though. The array doesn't have to be 26 indices long (as was the initial algorithm 26 letters 26 keys).

Any thoughts?

Dennis
package assignment3_part_a;
 
// This class implements a hash table
 
public class Hash_Table {
 
    private Word single_words[] = new Word[26];
    private Linked_List collision_list[] = new Linked_List[26];
 
    // Constructor
 
    public Hash_Table(){
 
 
    }
 
    // This method assigns a mapping key to the incoming word
 
    public void hashWord(String in_word){
 
        int map_key;
 
        // Determine the first letter of the word
 
        char first_letter = in_word.toLowerCase().charAt(0);
 
        // The ASCII character encoding scheme gives the letter 'a' the value 97;
        // subsequent letters, such as 'b' is 98, and so on. Therefore we can map the value by
        // subtracting the value 96 from the first letter's numerical value, as per the
        // the ASCII scheme. However, the map key needs to be also subtracted a value of 1 since the
        // the array goes from zero to 25. This leaves the subtraction at 96 + 1 = 97
 
        map_key = (int) first_letter - 97;
 
        // Check to see if there are any words in the hash table with that key
 
        if(single_words[map_key] == null && (collision_list[map_key] == null)){
 
            // No word associated with that key, so assign the word to that slot and
            // increment frequency
 
            Word new_word = new Word();
            single_words[map_key] = new_word;
 
            single_words[map_key].setWord(in_word.toLowerCase());
            single_words[map_key].incrementWordFrequency();
 
 
        }
 
        // There is only one word associate with that key
 
        else if(single_words[map_key] != null && (collision_list[map_key] == null)){
 
            // If the word in the array is the same as the incoming word then
            // only the frequency needs updating
 
            if(single_words[map_key].getWord().compareTo(in_word.toLowerCase()) == 0){
 
                // Update frequency
                
                single_words[map_key].incrementWordFrequency();
                
                
            }
            
            // Two words are different so both the words will need to be put in
            // the linked list as the "collision" has happened
            
            else{
            
                // Make a hard copy of Word currently in the array 
 
                Word copy_word = new Word();
 
                copy_word.setWord(single_words[map_key].getWord());
                copy_word.setWordFrequency(single_words[map_key].getWordFrequency());
                
                // Initialize a linked list and add to array of linked lists
 
                Linked_List new_list = new Linked_List();
                collision_list[map_key] = new_list;
 
                // Add word to linked list
 
                collision_list[map_key].addWordObject(copy_word);
 
                // Clear single word list associated with this map_key
 
                single_words[map_key] = null;
 
                // Create instance of Word for the incoming word and intialize
                // the frequency of the data member in the instance.
 
                Word new_word = new Word();
                new_word.setWord(in_word.toLowerCase());
                new_word.incrementWordFrequency();
 
                // Add the Word to the collision list
 
                collision_list[map_key].addWordObject(new_word);
 
            }
 
 
 
        }
        
        // There is more than one word associate with that key. The linked list
        // is already in use so add the word to the single linked list
 
        else{
 
            // Check to see if the incoming word already exists in the list
 
            if(collision_list[map_key].checkWord(in_word.toLowerCase())){
 
                // Already exists in the list, so just increment frequency
 
                collision_list[map_key].returnWord(in_word.toLowerCase()).incrementWordFrequency();
 
            }
 
            else{
 
                // Word doesn't exist in list so create instance of Word for the
                // incoming word and initalize the frequency of the data member
                // in the instance.
 
                Word new_word = new Word();
                new_word.setWord(in_word.toLowerCase());
                new_word.incrementWordFrequency();
 
                // Add the Word to the collision list
 
                collision_list[map_key].addWordObject(new_word);
 
            }
 
 
        }
 
    }
 
    // This method checks to see if a word exists in the hash table
    
    public boolean checkWord(String in_word){
        
        int map_key;
 
        // Determine the first letter of the word
 
        char first_letter = in_word.toLowerCase().charAt(0);
 
        // Map word as per the aforementioned algorithm
 
        map_key = (int) first_letter - 97;
 
        // First check to see if the single words list and collision list
        // associated with that key is empty so as to avoid an error
 
        if(single_words[map_key] == null && collision_list[map_key] == null){
            
            // Empty
 
            return false;
 
        }
 
        // If the single words list is empty and the collision list is not empty,
        // then the words assosciated with that key must be in the linked list
 
        else if(single_words[map_key] == null){
 
            // Check to see if the word is in the collision list
 
            if(collision_list[map_key].checkWord(in_word)){
 
                // Word is in the collision list
 
                return true;
 
            }
 
            else{
 
                // Word is not in the collision list
 
                return false;
 
            }
 
 
        }
 
        // Check check to see if the single words list holds the word
 
        else if(single_words[map_key].getWord().compareTo(in_word) == 0){
 
            // The word is the only word associated with the map key
            
            return true;
 
        }
        
        // Word is not in the hash table
        
        else{
            
            return false;
            
        }
        
        
        
    }
 
    public void removeWord(String in_word){
 
        int map_key;
 
        // Determine the first letter of the word
 
        char first_letter = in_word.toLowerCase().charAt(0);
 
        // Map word as per the aforementioned algorithm
 
        map_key = (int) first_letter - 97;
 
        // Check to see if the word is in the single words array
 
        if(single_words[map_key] != null){
 
            if(single_words[map_key].getWord().compareTo(in_word.toLowerCase()) == 0){
 
                // Word exists in the single words list, so delete it.
 
                single_words[map_key].deleteWord();
 
            }
 
        }
 
        // If not in single words list, check to see if it is in collision list
        // because if there was more than one word associated with that map key,
        // then all the words starting with the same letter will be in the
        // collision list
 
        else if(collision_list[map_key].checkWord(in_word.toLowerCase())){
 
            collision_list[map_key].removeWord(in_word.toLowerCase());
 
        }
 
    }
 
    public String getWordsAssociatedWithKey(int map_key){
 
        String words = null;
 
        // Check to see where words are located
 
        if(single_words[map_key] == null && (collision_list[map_key] == null)){
 
            // No words associated with that key, so do nothing
 
 
        }
 
        else if(single_words[map_key] != null && (collision_list[map_key] == null)){
 
            // Only one word and it is in the single linked list
 
            words = (map_key+1) + "       " + single_words[map_key].getWord() + "(" + single_words[map_key].getWordFrequency() + ")";
 
        }
 
        else{
 
            // More than one word associated with key. Need to create string with
            // all the words in the list information
 
            // Initialize string to include map key. Map key + 1 since the letter
            // 'a' will be associated with the number '1'; remember, the array
            // is from zero to 25, so we must adjust for this.
 
            words = (map_key + 1) + "       ";
 
            // The for loop must go from i = 1 to the list size because the linked list was
            // created in such a way that the 1st element in the list is at index
            // one. This is different than arrays, as array have the first element
            // at index zero.
 
            for(int i = 1; i <= collision_list[map_key].getListCount(); i++){
 
                words = words + collision_list[map_key].getWordObject(i).getWord()  + "(" + collision_list[map_key].getWordObject(i).getWordFrequency() + ")    ";
 
            }
 
        }
 
        return words;
 
    }
 
 
}

Open in new window

0
 
LVL 92

Expert Comment

by:objects
ID: 23707012
the algorithm used by the String class or the one I posted above will work fine, and as it is used standard by Java then you'd expect that it is a pretty uniform one.

Simplifying it if you still want to use just the first char then you could use the statistical frequency of words starting with each letter to better spread the distribution.

But as I mentioned above if you're looking for a good hash key for a string then the tried and tested algorithm would seem the obvious choice.
0
 
LVL 85

Expert Comment

by:ozo
ID: 23709048
depending on your words, a possible improvement might be something like
(first_letter * 31 + last_letter) % hashsize
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
This article will show, step by step, how to integrate R code into a R Sweave document
The viewer will learn how to implement Singleton Design Pattern in Java.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question