Solved

Hash Table with Linked Lists for Chaining

Posted on 2007-12-03
6
1,031 Views
Last Modified: 2012-05-05
I'm attempting to write a program for class that takes a dictionary file and reads it into a hash table. Then it will prompt the user for their file that they wish to spell check against the dictionary file.  It will read that file word by word, and then spit out the words that were not found in the hash table as spelling errors, giving the word that was misspelled, an the line number it was on.

I currently have the file scanner setup so that it eliminates odd characters, and splits up each line into multiple strings.  I've never written a hash table, so I'm a bit lost on how to even begin this project.

Here's basically what I understand.

A hash table is an array (obviously).   To figure out where to place each word, I'm going to apply the formula given by the assignment specifications, "hashFunction(w) = (c0*73^(0 mod 4) +c1*73^(1 % 4)+&+cn*73n % 4 ) % [TableLength]"

So I understand in theory that if there's a collision, I'll add a node to a linked list on the front of my chain stored in the corresponding array spot.  What I don't understand however, is how to implement the code.

Finally, to keep the has function running quickly, I'll start with an array size of 128. When I get 128 words in the array will need to double in size, and when I get to 256, it would double, and then 512, etc.

Thanks ahead of time for any help.  If you have a similar program I'd appreciate it, I'm not really looking for the answer, just some code to get me started.
0
Comment
Question by:jj1987
  • 3
  • 2
6 Comments
 
LVL 45

Expert Comment

by:Kdo
ID: 20397939
Hi jj1987,

Estimate the number of words in the hash table before you start.  Too many is probably better than too few.

There's a real issue with double the table size when you hit a target number of entries -- all of the hash values are subject to change so the word may no longer by stored in the same location.  :(

Also, pick a prime number for the table size.  That will help to avoid collisions.


Good Luck,
Kent
0
 

Author Comment

by:jj1987
ID: 20397986
@Kdo- Part of the assignment instructions are to start at 128 and then double the size as needed.  I'll have to rehash the contents when I do this each time.  I hate doing it, as it's not typically a good idea from my understanding, but in this case I'm forced to.
0
 
LVL 45

Expert Comment

by:Kdo
ID: 20398131
Hi jj1987,

Either your instructor doesn't understand what he's teaching, or you're supposed to encounter issues and get a better understanding of the potential pitfalls.  (Hope that it's the latter.)

Encapsulate the items in the hash table into a struct.  That'll help manage things a bit easier than making it all global storage, and allow the program to easily have multiple hash tables.

typdef struct
{
  int    Count;                  // Number of items in the table
  int    Limit;                    // Maximum number of items in the table
  int    OverflowCount;  // Number of items in the overflow block(s)
  int    OverflowLimit;    // Number of reserved items in the overflow block
 void*  Table;                 // The actual hast table
 void*  Overflow;          // overflow pages
}  hash_t;

A couple of quick functions to get started.

void InitializeHashTable (hash_t *table, int limit, int overflowlimit)
{
  table->Count = 0;
  table->Limit = limit;
  table->OverflowCount = 0;
  table->OverflowLimit = overflowlimit;

  table->Table = malloc (limit * sizeof (char*));
  memset (table->Table, 0, limit * sizeof (char*));

  table->Overflow = malloc (overflowlimit * sizeof (char*))
  memset (table->Overflow, 0, overflowlimit * sizeof (char*))
}

This assumes that you're going to be storing pointers to strings.  (Note that for this kind of project, each node should probably be a buffer of 1K or 4K in length and that each node should contain multiple strings.  But that's probably a future assignment.)  :)

The code an Add () function.

When the table is "full", either by the total number of nodes or by the size of the overflow section, create a new hash table that is twice as large as this one.  Then scan the old table, rehash the items according to the new table and insert them into the new table.  This also means that you'll have to be able to traverse the entire hash table to extract all of the nodes.





Good Luck,
Kent
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 

Author Comment

by:jj1987
ID: 20398777
That code looks a bit unlike anything I've seen before.  I got to thinking, and the best way I could think of to help others help me was to get the program as close as possible to working, which in this case for me, was to get it working with quadratic probing.  I believe my insert function is working correctly now.

Is it possible to modify this code to work with the chaining method?



/* Include standard library */

#include <stdio.h>

#define THRESHOLD 0.70
 

/* These are the possible values used to indicate that a position in the hash

table is empty,  or full */

#define EMPTY 0

#define FULL 1
 

/* A struct representing the hash table */

struct HashTable

{

    int* store;

    char* taken;

    int numItems;

    int arrayLength;  // Length of array
 

}; // end struct HashTable
 

/* define functions */

void load(); // loads the user's file to spell check.

struct HashTable* newHashTable();

int hashFunction(struct HashTable* h, char dictword);

void insert(struct HashTable* h, char dictword);

void expandTable(struct HashTable* h);

int search(struct HashTable* h, char searchword);
 
 

struct HashTable* newHashTable()

{

    struct HashTable* newguy = (struct HashTable*)malloc(sizeof(struct HashTable));

    newguy->store = (int*)calloc(17, sizeof(int));

    newguy->taken = (char*)calloc(17, sizeof(char));

    newguy->numItems = 0;

    newguy->arrayLength = 17;
 

    return newguy;

}
 

int hashFunction(struct HashTable* h, char dictword)

{

    /* We'll use a simple has function here, and change it when everything

    else works */

    return dictword % h->arrayLength;

}
 

void insert(struct HashTable* h, char dictword)

{

    int hashValue = hashFunction(h, dictword);

    int position;

    int offset;

    int i;
 

    /* Find a spot to put the new value by quadratic probing */

    position = hashValue;
 

    /* Keep looking for a spot until we find one,

    even if that spot has had lazy deletion done to it. */

    for(i=1; h->taken[position] == FULL; i++)

    {

        /* If we couldn't find a spot to place the

        item make the hash table bigger. */

        if(i > h->arrayLength)

        {

            expandTable(h);
 

            /* After expanding the table, we need to find the updated hash value

            for the item we're inserting, and start the probe over */

            hashValue = hashFunction(h,dictword);

            i=0;

        }

        /* Quadratic Probing */

        offset = i*i;
 

        // Mod by the array length to make sure that we're within the array. */

        position = (hashValue + offset) % h->arrayLength;

    }

    // Once a spot for the item is found, insert the item

    h->store[position] = dictword;

    h->taken[position] = FULL;

    h->numItems++;
 

    // If the table has become too full, expand it.

    if((double)h->numItems / (double)h->arrayLength > THRESHOLD)

        expandTable(h);

}
 

void expandTable(struct HashTable* h)

{

    // Store the old hash table contents

    int* oldStore = h->store;

    char* oldTaken = h->taken;

    int oldLength = h->arrayLength;

    int i;
 

    // Create a new, larger hash table of roughly double the size. Ideally, you

    // should find a size that is a prime number, because that improves the

    // behavior of hash collision resolution. Quickly finding prime numbers is

    // beyond the scope of this course.

    h->store = (int*)calloc(oldLength*2+1,sizeof(int));

    h->taken = (char*)calloc(oldLength*2+1,sizeof(char));

    h->arrayLength = oldLength*2 +1;

    h->numItems = 0;
 

    // Reinsert each item from the old table into the new table

    for(i=0;i<oldLength;i++)

    {

        if(oldTaken[i] == FULL)

            insert(h,oldStore[i]);

    }
 

    // Get rid of the memory from the old table

    free(oldStore);

    free(oldTaken);

}
 

int search(struct HashTable* h, char searchword)

{

    int hashValue = hashFunction(h,searchword);

    int position;

    int offset;

    int i;
 

    // Keep looking through the table until you either find the item, or you're

    // sure that it's not in there

    position = hashValue;

    for(i=1; h->taken[position]; i++)

    {

        // If we've found what we're looking for, then we've found it!

        if(h->store[position] == searchword)

            return position;
 

        // Certain pathological input may turn this loop into an infinite loop.

        // It's not likely, but it's possible. Stop the search if that's what's

        // happening.

        if(i> h->arrayLength)

            return -1;
 

        // Otherwise use quadratic probing to keep searching the table

        offset = i*i;

        position = (hashValue + offset) % h->arrayLength;

    }
 

    // If we've searched until we've found a genuinely empty spot without

    // finding the target value, then it must not be in the table.

    return -1;

}
 

int main(void)

{

    /* Run the load function to prompt user for the file to spellcheck */

    /* This function automatically loads it into the hash table */

    load();
 

    /* Load dictionary File */

    char s[1024];

    char fname[256];

    FILE* fin;

    struct HashTable* h = newHashTable();
 
 

    printf("Enter the dictionary file name:   ");

    scanf("%s",&fname);
 

    /* open the file that the user specified in read only mode */

    fin = fopen(fname,"r");
 

    while(fgets(s, 1024, fin) != NULL)

    {

        insert(h,s);

    } // end while(fgets(s, 1024, fin) != NULL)
 

    printf("\nProgram has completed successfully!\n");

    system("PAUSE");

}
 

void load()

{

    char s[1024];

    char* t;

    char fname[256];

    FILE* fin;

    char delims[] = " \n\r\t~!@#$%^&*()-_=+[]{}\|;:,.<>/?";

    int linenum = 0;
 

    /* Open a file specified by the user for reading */

    printf("Enter the filename you wish to spellcheck:   ");

    scanf("%s",&fname);
 

    /* open the file that the user specified in read only mode */

    fin = fopen(fname,"r");
 

    /* Loop through each line in the file and print each token */

    while(fgets(s, 1024, fin) != NULL)

    {

        linenum++;

        /* Loop through each token on the line and break up

        the string by the delimiters */

        t = strtok(s,delims);

        while(t != NULL)

        {

            /* Get the next token on the line */

            t = strtok(NULL, delims);

            if(t != NULL)

                printf("\nLine %d-   %s",linenum, t);

        } // end while(t != NULL)
 

    } // end while(fgets(s, 1024, fin) != NULL)

} // end void load

Open in new window

0
 
LVL 4

Expert Comment

by:Fippy_Darkpaw
ID: 20412204
Can you explain chaining more? Is like this ... when there is a collision:

(1) find a new bucket by probing
(2) leave a pointer to the overflow bucket in the bucket it originally hashed to

If that is the case, then:

(1) each hash entry struct could have a pointer to "next*"
(2) Initially it would be null if there is no overflow to that bucket.
(3) Then if there is an overflow then probe for a new bucket, and set the "next*" pointer to the overflow bucket

Then whenever you look up something you hash it, and then follow the pointer chain until a null.

Also, I don't see you delete() the hash table in main, so that's probably a memory leak. The hash table struct has int and char pointers so those will probably have to be deleted manually before you delete the hash table as well.
0
 
LVL 45

Accepted Solution

by:
Kdo earned 500 total points
ID: 20412455
Hi jj,

The code that I posted really isn't that much different than what you have.  :)  The biggest differences are that the dynamic buffer is of type void* so that it easily accomodates any data type and that the overflow area is controlled by the struct.

There are a lot of ways to implement a variables sized table of strings. char** is probably the data type that you want/need.  Double '**'.  That will certainly cause a few segmentation faults the first few times that you try using it, but for this application it seems like the right way to go.

The table is variable length and dynamic.  It grows as the program adds things to it, so it's necessary to have a structure that supports it.  char* is a pointer.  It points to a string.  Your program needs to point to an array of strings, so char** is appropriate.

Also, the structure should also be used to manage the overflow area.  It, too, should be of type char **.


Kent
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Suggested Solutions

This tutorial is posted by Aaron Wojnowski, administrator at SDKExpert.net.  To view more iPhone tutorials, visit www.sdkexpert.net. This is a very simple tutorial on finding the user's current location easily. In this tutorial, you will learn ho…
Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to use…
The goal of this video is to provide viewers with basic examples to understand how to use strings and some functions related to them in the C programming language.
The goal of this video is to provide viewers with basic examples to understand how to create, access, and change arrays in the C programming language.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now