Hi, im having a bit of trouble. ok. a lot of trouble.
I am learning C and i have been given an exercise on arrays and pointers.
The exercise is to make 1 list to store student names, and another list to store the course they are enrolled on.
I know basically how it works, ie the ascii values are added together and then you use the mod operator to convert the total into a number which becomes the size of the array.
So can anybody give me a bit of help to start me on my way. I must state that I am not asking anybody to do the work for me, but the more help the better. Could you please answer asap. I look forward to hearing from youin the near future.
How about I give a slightly more precise description:
Suppose you have 100 entries in your list. Then, for each name you would find the index for it by adding the ASCII values together, the applying mod 100. That is your hash function. So, for example, a student named "Li", consists of ASCII values 0x4c and 0x69. When you add them you get 0xb5 which is 181 decimal. If you take mod 100 then you get 81. So Li should be stored at index 81.
Does that help?
If you're having trouble coding it, please provide whatever attempt you have made so far, complete with compiler errors if you can't solve those yourself, and we'll take it one step at a time.
For ozo: how is that better?!
Try your function with the input "Johnny":
This are the intermediaries at every step:
74
2553
84353
2783759
91864157
-1263449994
This is the result:
Final hash value:-1263449994
Am I missing something?
For imladris. Yes, you are right. And this is just for the sake of argument: isn't, statistically speeking, the hash table distribution better when the length of the entry list is a prime number.
For OutOvSyt.
How do you intend to solve conflicts in your implementation. There are at least two approaches, depending very much on the number of entries you estimate in your list.
If the number is at most the length of the list, you can try adding the conflicting value at the imediate empty position.
If you have more than that, you should consider your initial array as an array of pointers to some dynamic structure, like a linked list, which holds all the values with the same hash key, or better yet, a binary tree, sorted by name (in your case).
Bye!
0
Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.
For imladris - forget about my question, goes beyond the scope of this discussion, of course, the bigger the number of entries, the wider the dispersion, but then again, no point making this number too big (what's the point of having 10000 entries when the average number of records is about 100).
What I had in mind was the way to decide on the optimum number of entries given an estimation of the number of records, but, as I said, I think this is not the place for such debate, unless, of course, OutOvSyt is interested.
By, the way, OutOvSyt, have you been helped?
Anything to ask, some code to post?
0
OutOvSytAuthor Commented:
Thanks. You are all being a great help, but its not so much the code i am having trouble with, its more like the whole concept of hashing.
I have read your comments and i appreciate them, but can we try going a bit more basic?
Sorry but the only code i have is the algorithm to get the total and convert it to a range of 0 to MaxNum-1 where MaxNum of the size of the array.
This is basically add the ascii values together and divide them by 24. (I am using 24 as the size of the array).
I would be very grateful if you could post your revised comments after taking this comment into consideration.
Well, I see the others got bored, so I might as well tell you a thing or two.
Now, "the whole concept of hashing"... Basically, is an alternative to the ordinary sort. Sometimes is preferable to use hash tables instead of sorted lists (arrays) or binary trees in order to retrieve items fast from a collection.
Why is that? Well, just consider you have a sorted, index based, array: searching for an element is quite fast with binary search O(lg2 n), but adding or deleting one is O(n).
Let's say than that you have a binary search tree. Basically on these kind of strucutures the complexity of add/delete/retrieve is around O(lg2 n), but, depending on the order of all operations, the tree might get seriously unbalanced, then you will have to balance it in order to ensure the targeted complexity... and so on.
So, depending on the size of data being processed, and on the presumed dynamics of the operations, one might choose a hashing table for manipulating the set of records. The discussion on when exactly each approach is preferable is really too involved...
There are two, as I see it, main use cases for hashing:
1. You don't have such a large amount of records, but your program acceses them extensively -- you need a way to retrieve the location of one record in almost no time -- let's say you have about 50 records -- making a hash table of about 200 records with pointers to such records -- resolving eventual conflicts with assigning to the imediate free position is the best approach, far better than a sorted array or a binary tree, or any other structure.
2. You have a large amount of data, and adding and deleting are very, very frequent. You don't want a sorted list from reasons of complexity. You don't want a binary tree, because frequent adds and deletes will most likely debalance it.
You might use a hash table with a quite big number of entries, and then, for the records on the same key decide again on the kind of structure. Migth be just a list, a priority list, a sorted array, a binary tree, or even another hash table on another key...
Again, if you think the answer is not specific enough, try making your question more specific.
Cheers!
Alternative to a sort and search. Data is often sorted so you can do the binary search (throws away half the candidates each time, like a human with a phonebook, open to middle, move 1/2 way in the correct direction each time (because its sorted)). This search is fast, but not always fast enough...
The best luck I've had for collision resolution is to have several hash functions, and to use each in turn until a space is found. But I usually only use it on small known data sets with very carefully chosen functions, not on user entered data...
0
OutOvSytAuthor Commented:
alot more information than i expected. much more detail than i thought i would need but very very helpful. thanx alot
0
Featured Post
We value your feedback.
Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!
Suppose you have 100 entries in your list. Then, for each name you would find the index for it by adding the ASCII values together, the applying mod 100. That is your hash function. So, for example, a student named "Li", consists of ASCII values 0x4c and 0x69. When you add them you get 0xb5 which is 181 decimal. If you take mod 100 then you get 81. So Li should be stored at index 81.
Does that help?
If you're having trouble coding it, please provide whatever attempt you have made so far, complete with compiler errors if you can't solve those yourself, and we'll take it one step at a time.