Solved

# Hash Table Improvement

Posted on 2009-02-21
Medium Priority
303 Views
Hi Experts,

There are two parts to my question:

A) I have implemented a hash table that has 26 keys (one for each letter of the alphabet) that will store words in the table/array. A collision list (i.e. overflow list) has been implemented as a single linked list such that if an incoming word into the hash table shares the same key as another word already in the table, then the words will be added to the collision list.

The hash function implemented is too simplistic to be used in practice since it distributes words nonuniformly across the table. I want to design a better hash function, one that will provide a more uniform distribution of the words over the total components of the array (the array size may change if needed) and which will give similar sized overflow lists.

B) I then wish to design and implement an experiment which shows that my hash function performs appropriately. The results of the experiment should:

- demonstrate the uniformity of the word distribution.
- provide information re the length (ie maximum, mimimum, average ) of the overflow lists and
the length (ie maximum, mimimum, average) of search paths to find a word or to identify its
absence.
- give the frequency distribution of the size of the possible overflow lists.

Any help on these two parts would be greatly appreciated. No code needed, just a **detailed ** algorithm.
Thanks,

Dennis
0
Question by:datopdogg7

LVL 92

Expert Comment

ID: 23702944
why not just use String's hashCode() method (and mod it to fit your table size)
0

LVL 92

Expert Comment

ID: 23702951
for the experiment, generate a large number of words (randomly or copy a large piece of text) and add them to your list
once done you can report the number of words allocated to each bucket

0

Author Comment

ID: 23702952
Hey Objects,
I need to make my own...
0

LVL 92

Accepted Solution

objects earned 750 total points
ID: 23702958
0

Author Comment

ID: 23702993
Ok, I read the algorithm, but how will that guarantee a hash table that has uniform word distribution? Also, how would I know how to initalize the size of the hash table? Would the mapping keys represent an integer range?
Thanks,
Dennis
0

LVL 92

Expert Comment

ID: 23703100
u can't guarantee uniform distibution without knowing exactly what words you are going to enter :)

0

LVL 92

Expert Comment

ID: 23703194
have a read of the following

http://en.wikipedia.org/wiki/Hash_function

the best hash function to use depends on the data you are hashing

0

LVL 86

Assisted Solution

CEHJ earned 750 total points
ID: 23703892
There seems to be quite a bit of confusion here. The hashing algo has nothing to do with the words in the table - it's applied to the *keys*, which, in your case:

>>I have implemented a hash table that has 26 keys (one for each letter of the alphabet)

are each letter of the alphabet. Evenness is to do with where the keys are in the table. In the case of the Java API, the keys would be perfectly even, since the hashCode() of each would return the ascii value of each letter
0

Author Comment

ID: 23705727
Hi CEHJ,

Yes I am aware of how the algorithm works. Attached is the code I am currently using.

I am still looking for a better way to distribute the incoming words though. The array doesn't have to be 26 indices long (as was the initial algorithm 26 letters 26 keys).

Any thoughts?

Dennis
``````package assignment3_part_a;

// This class implements a hash table

public class Hash_Table {

private Word single_words[] = new Word[26];

// Constructor

public Hash_Table(){

}

// This method assigns a mapping key to the incoming word

public void hashWord(String in_word){

int map_key;

// Determine the first letter of the word

char first_letter = in_word.toLowerCase().charAt(0);

// The ASCII character encoding scheme gives the letter 'a' the value 97;
// subsequent letters, such as 'b' is 98, and so on. Therefore we can map the value by
// subtracting the value 96 from the first letter's numerical value, as per the
// the ASCII scheme. However, the map key needs to be also subtracted a value of 1 since the
// the array goes from zero to 25. This leaves the subtraction at 96 + 1 = 97

map_key = (int) first_letter - 97;

// Check to see if there are any words in the hash table with that key

if(single_words[map_key] == null && (collision_list[map_key] == null)){

// No word associated with that key, so assign the word to that slot and
// increment frequency

Word new_word = new Word();
single_words[map_key] = new_word;

single_words[map_key].setWord(in_word.toLowerCase());
single_words[map_key].incrementWordFrequency();

}

// There is only one word associate with that key

else if(single_words[map_key] != null && (collision_list[map_key] == null)){

// If the word in the array is the same as the incoming word then
// only the frequency needs updating

if(single_words[map_key].getWord().compareTo(in_word.toLowerCase()) == 0){

// Update frequency

single_words[map_key].incrementWordFrequency();

}

// Two words are different so both the words will need to be put in
// the linked list as the "collision" has happened

else{

// Make a hard copy of Word currently in the array

Word copy_word = new Word();

copy_word.setWord(single_words[map_key].getWord());
copy_word.setWordFrequency(single_words[map_key].getWordFrequency());

// Initialize a linked list and add to array of linked lists

collision_list[map_key] = new_list;

// Clear single word list associated with this map_key

single_words[map_key] = null;

// Create instance of Word for the incoming word and intialize
// the frequency of the data member in the instance.

Word new_word = new Word();
new_word.setWord(in_word.toLowerCase());
new_word.incrementWordFrequency();

// Add the Word to the collision list

}

}

// There is more than one word associate with that key. The linked list
// is already in use so add the word to the single linked list

else{

// Check to see if the incoming word already exists in the list

if(collision_list[map_key].checkWord(in_word.toLowerCase())){

// Already exists in the list, so just increment frequency

collision_list[map_key].returnWord(in_word.toLowerCase()).incrementWordFrequency();

}

else{

// Word doesn't exist in list so create instance of Word for the
// incoming word and initalize the frequency of the data member
// in the instance.

Word new_word = new Word();
new_word.setWord(in_word.toLowerCase());
new_word.incrementWordFrequency();

// Add the Word to the collision list

}

}

}

// This method checks to see if a word exists in the hash table

public boolean checkWord(String in_word){

int map_key;

// Determine the first letter of the word

char first_letter = in_word.toLowerCase().charAt(0);

// Map word as per the aforementioned algorithm

map_key = (int) first_letter - 97;

// First check to see if the single words list and collision list
// associated with that key is empty so as to avoid an error

if(single_words[map_key] == null && collision_list[map_key] == null){

// Empty

return false;

}

// If the single words list is empty and the collision list is not empty,
// then the words assosciated with that key must be in the linked list

else if(single_words[map_key] == null){

// Check to see if the word is in the collision list

if(collision_list[map_key].checkWord(in_word)){

// Word is in the collision list

return true;

}

else{

// Word is not in the collision list

return false;

}

}

// Check check to see if the single words list holds the word

else if(single_words[map_key].getWord().compareTo(in_word) == 0){

// The word is the only word associated with the map key

return true;

}

// Word is not in the hash table

else{

return false;

}

}

public void removeWord(String in_word){

int map_key;

// Determine the first letter of the word

char first_letter = in_word.toLowerCase().charAt(0);

// Map word as per the aforementioned algorithm

map_key = (int) first_letter - 97;

// Check to see if the word is in the single words array

if(single_words[map_key] != null){

if(single_words[map_key].getWord().compareTo(in_word.toLowerCase()) == 0){

// Word exists in the single words list, so delete it.

single_words[map_key].deleteWord();

}

}

// If not in single words list, check to see if it is in collision list
// because if there was more than one word associated with that map key,
// then all the words starting with the same letter will be in the
// collision list

else if(collision_list[map_key].checkWord(in_word.toLowerCase())){

collision_list[map_key].removeWord(in_word.toLowerCase());

}

}

public String getWordsAssociatedWithKey(int map_key){

String words = null;

// Check to see where words are located

if(single_words[map_key] == null && (collision_list[map_key] == null)){

// No words associated with that key, so do nothing

}

else if(single_words[map_key] != null && (collision_list[map_key] == null)){

// Only one word and it is in the single linked list

words = (map_key+1) + "       " + single_words[map_key].getWord() + "(" + single_words[map_key].getWordFrequency() + ")";

}

else{

// More than one word associated with key. Need to create string with
// all the words in the list information

// Initialize string to include map key. Map key + 1 since the letter
// 'a' will be associated with the number '1'; remember, the array
// is from zero to 25, so we must adjust for this.

words = (map_key + 1) + "       ";

// The for loop must go from i = 1 to the list size because the linked list was
// created in such a way that the 1st element in the list is at index
// one. This is different than arrays, as array have the first element
// at index zero.

for(int i = 1; i <= collision_list[map_key].getListCount(); i++){

words = words + collision_list[map_key].getWordObject(i).getWord()  + "(" + collision_list[map_key].getWordObject(i).getWordFrequency() + ")    ";

}

}

return words;

}

}
``````
0

LVL 92

Expert Comment

ID: 23707012
the algorithm used by the String class or the one I posted above will work fine, and as it is used standard by Java then you'd expect that it is a pretty uniform one.

Simplifying it if you still want to use just the first char then you could use the statistical frequency of words starting with each letter to better spread the distribution.

But as I mentioned above if you're looking for a good hash key for a string then the tried and tested algorithm would seem the obvious choice.
0

LVL 85

Expert Comment

ID: 23709048
depending on your words, a possible improvement might be something like
(first_letter * 31 + last_letter) % hashsize
0

## Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilationā¦
This article will show, step by step, how to integrate R code into a R Sweave document
The viewer will learn how to implement Singleton Design Pattern in Java.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.
###### Suggested Courses
Course of the Month15 days, 12 hours left to enroll

#### 850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.