Solved

Hashing

Posted on 2002-03-28
9
293 Views
Last Modified: 2010-04-15
Hi, im having a bit of trouble. ok. a lot of trouble.

I am learning C  and i have been given an exercise on arrays and pointers.  

The exercise is to make 1 list to store student names, and another list to store the course they are enrolled on.

I know basically how it works, ie the ascii values are added together and then you use the mod operator to convert the total into a number which becomes the size of the array.

So can anybody give me a bit of help to start me on my way.  I must state that I am not asking anybody to do the work for me, but the more help the better.  Could you please answer asap. I look forward to hearing from youin the near future.

Thanx

~OutOvSyt
0
Comment
Question by:OutOvSyt
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
9 Comments
 
LVL 16

Expert Comment

by:imladris
ID: 6902829
How about I give a slightly more precise description:

Suppose you have 100 entries in your list. Then, for each name you would find the index for it by adding the ASCII values together, the applying mod 100. That is your hash function. So, for example, a student named "Li", consists of ASCII values 0x4c and 0x69. When you add them you get 0xb5 which is 181 decimal. If you take mod 100 then you get 81. So Li should be stored at index 81.

Does that help?

If you're having trouble coding it, please provide whatever attempt you have made so far, complete with compiler errors if you can't solve those yourself, and we'll take it one step at a time.
0
 
LVL 84

Expert Comment

by:ozo
ID: 6904602
/* a better hash could be */
int hash(char *s){
    int h=0;
    while( *s ){
     h += (h<<5) + *s++;
    }
    return h;
}
0
 
LVL 1

Expert Comment

by:Robalitoru
ID: 6904702
 For ozo: how is that better?!
  Try your function with the input "Johnny":
  This are the intermediaries at every step:
  74
  2553
  84353
  2783759
  91864157
  -1263449994
  This is the result:
  Final hash value:-1263449994
  Am I missing something?

 For imladris. Yes, you are right. And this is just for the sake of argument: isn't, statistically speeking, the hash table distribution better when the length of the entry list is a prime number.

 For OutOvSyt.
 How do you intend to solve conflicts in your implementation. There are at least two approaches, depending very much on the number of entries you estimate in your list.
 If the number is at most the length of the list, you can try adding the conflicting value at the imediate empty position.
 If you have more than that, you should consider your initial array as an array of pointers to some dynamic structure, like a linked list, which holds all the values with the same hash key, or better yet, a binary tree, sorted by name (in your case).

 Bye!
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 1

Expert Comment

by:Robalitoru
ID: 6904705
 Sorry, imladris, forgot the question mark -- ?
0
 
LVL 1

Expert Comment

by:Robalitoru
ID: 6905335
  For imladris - forget about my question, goes beyond the scope of this discussion, of course, the bigger the number of entries, the wider the dispersion, but then again, no point making this number too big (what's the point of having 10000 entries when the average number of records is about 100).
   What I had in mind was the way to decide on the optimum number of entries given an estimation of the number of records, but, as I said, I think this is not the place for such debate, unless, of course, OutOvSyt is interested.

  By, the way, OutOvSyt, have you been helped?
  Anything to ask, some code to post?
0
 

Author Comment

by:OutOvSyt
ID: 6915700
Thanks. You are all being a great help, but its not so much the code i am having trouble with, its more like the whole concept of hashing.

I have read your comments and i appreciate them, but can we try going a bit more basic?

Sorry but the only code i have is the algorithm to get the total and convert it to a range of 0 to MaxNum-1 where MaxNum of the size of the array.

This is basically add the ascii values together and divide them by 24. (I am using 24 as the size of the array).

I would be very grateful if you could post your revised comments after taking this comment into consideration.

Thanking you in advance.

~OutOvSyt

0
 
LVL 1

Accepted Solution

by:
Robalitoru earned 300 total points
ID: 6917555
  Well, I see the others got bored, so I might as well tell you a thing or two.
   Now, "the whole concept of hashing"... Basically, is an alternative to the ordinary sort. Sometimes is preferable to use hash tables instead of sorted lists (arrays) or binary trees in order to retrieve items fast from a collection.
   Why is that? Well, just consider you have a sorted, index based,  array: searching for an element is quite fast with binary search O(lg2 n), but adding or deleting one is O(n).
   Let's say than that you have a binary search tree. Basically on these kind of strucutures the complexity of add/delete/retrieve is around O(lg2 n), but, depending on the order of all operations, the tree might get seriously unbalanced, then you will have to balance it in order to ensure the targeted complexity... and so on.
   So, depending on the size of data being processed, and on the presumed dynamics of the operations, one might choose a hashing table for manipulating the set of records. The discussion on when exactly each approach is preferable is really too involved...

   There are two, as I see it, main use cases for hashing:
1.  You don't have such a large amount of records, but your program acceses them extensively -- you need a way to retrieve the location of one record in almost no time -- let's say you have about 50 records -- making a hash table of about 200 records with pointers to such records -- resolving eventual conflicts with assigning to the imediate free position is the best approach, far better than a sorted array or a binary tree, or any other structure.

2. You have a large amount of data, and adding and deleting are very, very frequent. You don't want a sorted list from reasons of complexity. You don't want a binary tree, because frequent adds and deletes will most likely debalance it.
   You might use a hash table with a quite big number of entries, and then, for the records on the same key decide again on the kind of structure. Migth be just a list, a priority list, a sorted array, a binary tree, or even another hash table on another key...

  Again, if you think the answer is not specific enough, try making your question more specific.
  Cheers!
0
 
LVL 2

Expert Comment

by:jonnin
ID: 6933927
Slight clarify on the above:

Alternative to a sort and search. Data is often sorted so you can do the binary search (throws away half the candidates each time, like a human with a phonebook, open to middle, move 1/2 way in the correct direction each time (because its sorted)).  This search is fast, but not always fast enough...  

The best luck I've had for collision resolution is to have several hash functions, and to use each in turn until a space is found.  But I usually only use it on small known data sets with very carefully chosen functions, not on user entered data...





0
 

Author Comment

by:OutOvSyt
ID: 8240052
alot more information than i expected. much more detail than i thought i would need but very very helpful. thanx alot
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
Examines three attack vectors, specifically, the different types of malware used in malicious attacks, web application attacks, and finally, network based attacks.  Concludes by examining the means of securing and protecting critical systems and inf…
The goal of this video is to provide viewers with basic examples to understand how to use strings and some functions related to them in the C programming language.
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use while-loops in the C programming language.

696 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question