Example code for hash_map

I'm looking for example code that uses hash_map or any other hash template class.
I'm also looking for a good example in which using hash_map makes a big difference versus using std::map.

I don't have any experience using hash code of any kind, and I'm trying to figure out if it is worth brushing up my skills in this area.
It seems to me that for the most part, the std::map class can do the job good enough.

Any and all input would be appreciated.
LVL 30
AxterAsked:
Who is Participating?
 
JackThorntonCommented:
Actually the goal isn't to produce "evenly spaced" keys, the goal is to reduce collisions by choice of keys (and by size of table vs. number of inputs). For example, if you have a table that is some prime number near 4000 entries large, and you only put 1000 items into it, you will have less probability of collisions than if you have a table that size and put 3500 items into it. With a more densly populated table, the quality of the hash generator becomes more important.

It just occured to me that "evenly spaced" and "random" are probably both attempts to express the same thought, for which I can't seem to come up with a good word. Maybe more internet acronyms would help ;-)

Also note that there are generally two different kinds of hash schemes - closed and open. In closed schemes, if there is a collision, you march ahead in the table and find an open spot. This means that (a) you cannot have more inputs than the size of the table (hence closed), and (b) you degenerate into doing key compares as you march through the table. In open schemes, there is the possibility for a sub-container at every node (e.g. a list, tree, array, whatever) so that collisions generate multiple entries at that node. This removes the limit to the number of items stored, but, again, you have to go back to doing key compares to resolve the collision.

Despite the complexity, hash tables do, in practice, operate much faster than balanced trees on appropriate sized tables. Consider the case of 1024 items - with a balanced tree, you will end up doing 10 compares; in a hash table (if you're lucky), you might get your "hit" with a hash calculation and a single indexed lookup. Conversely, the worse case goes way up (e.g. your hash algorithm, for some reason, puts all of your items into two slots, and the sub-container is linear, so you do 512 key compares to find the hit).

I seem to be babbling now, so I'll sign this one off.
0
 
jkrCommented:
0
 
AxterAuthor Commented:
Thanks for the example jkr, but I already had that link.

I'm really looking for an example where I can see a clear difference between hash_map and std::map.
The example in that link could just as easily used std::map.
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
jkrCommented:
>>I'm really looking for an example where I can see a
>>clear difference between hash_map and std::map

The 'clear difference' is that the mapping algorithm is/should be faster. I don't have too much experience in that area either, but the idea of a hashing algorithm is to 'calculate' the target from the input.

>>The example in that link could just as easily used
>>std::map

That's AFAIK the advantage of that type of map - it's so transparent that you can't see any difference...
0
 
JackThorntonCommented:
As long as the hash function is reasonable, a hash table is more efficient (i.e. faster insertions and lookups) for situations where there are large numbers of entries (e.g. > 1000) and where the compare-time per element is relatively large (e.g. string compares vs. integer compares). There is no overhead for balancing trees or for worse-case behavior on unbalanced trees, plus less overhead for element-to-element comparisons (as must be done in traversing trees). For smaller sets of elements, a hash-map may not pay any dividends over the standard black/red-tree maps (in much the same way as, if you only have 4 or 5 elements, you may as well keep an array and do a linear search instead of creating a map).

The major drawback with hash tables is that you cannot get an ordered set of elements back out of one - in other words, no forward or reverse iterator to go from begin() to end(). In many cases, e.g. symbol tables, this isn't a problem (unless, of course, you want to print out a list of symbols, in alphabetical order, at the end).

As with any other programming tools, select the appropriate one for the specific job at hand. It wouldn't hurt to brush up on the hash maps. Knowing and using only a single type of container is like having only one medium-size philips screwdriver and trying to use it to repair eyeglasses and pry 18-inch bolts out of battleships.

- jack
0
 
AxterAuthor Commented:
So am I correct in saying that the way you use the hash_map class is no difference then the way you use std::map???
And the only difference is in how fast one-vs-the-other for a particular requirement.

I just notice that the hash_map does not have its data sorted, where as the std::map has its data sorted.
0
 
jkrCommented:
>>So am I correct in saying that the way you use the
>>hash_map class is no difference then the way you
>>use std::map???

AFAIK - yes. Just the mapping algorithm is different...
0
 
AxterAuthor Commented:
I've seen you post this before, but never bothered to ask.
What is AFAIK?
0
 
JackThorntonCommented:
That's correct - you use a hash_map just as you would a standard map, you just can't use iterators. The only cavaet is that there are a limited number of types for which hash functions have been defined; if you want to use other types (e.g. unicode strings), then you need to provide your own hash function in the template definition. Hash functions are used to create a simple index from an input; e.g. turning the string "Don't Panic - I have a towel" into, say, 42. Hash functions should be relatively quick (e.g. not take as much time to calculate as several comparisons would take) and generate fairly "random" outputs from inputs so as to reduce the number of collisions.
0
 
JackThorntonCommented:
AFAIK = As far as I know
0
 
JackThorntonCommented:
At least, that's what it means as far as I know   ;-)
0
 
JackThorntonCommented:
At least, that's what it means as far as I know   ;-)
0
 
AxterAuthor Commented:
>>AFAIK = As far as I know

IMHO, I think before you know it, the internet is going to have so many acronyms & symbols, that a newbie is going to need a special dictionary in order to carry out a normal conversation. :- ) :-0 ;-}
0
 
AxterAuthor Commented:
If you don't mind jkr, I'll post a seperate 50pt question for you, and award this one to JackThornton.
0
 
AxterAuthor Commented:
JackThornton,
>>generate fairly "random" outputs from inputs so as to
>>reduce the number of collisions.
This part I don't understand.  I know what the collision problems are with hash, but I don't understand how random outputs would reduce the number of collisions.
0
 
AxterAuthor Commented:
Wouldn't the goal of a hash_map be to produce an evenly spaced set of keys?
I would think that random outputs would reduce the ability to produce evenly spaced keys.
0
 
JackThorntonCommented:
Well, "random" was a poor choice of words. Obviously, the function must map a particular input uniquely to a particular output, or the whole thing would fail. What I meant to say was that the function should be able to generate outputs that spread evenly among the desired index range (i.e. size of hash table) for the types of inputs provided. You don't want functions that tend to "cluster" around a small number of values over the range of expected inputs. For example, if you were to hash a string using only the value of the first character, it would be a very fast function, but you would only generate (most likely fewer than) 96 index values, so you would get lots and lots of collisions. Alternately, you could build a CRC from the entire string, which would probably give a good spread of values, but be computationally expensive enough to ruin the benefits of hashing. Both of these are extreme examples of poorly chosen algorithms - it's probably pretty simple to come up with reasonable algorithms for any input type, it just means you have to think about the problem a little.
0
 
AxterAuthor Commented:
>>I seem to be babbling now, so I'll sign this one off.

When the info is good, it's not babbling.

Thanks, this clear up somethings for me.  

I wounder what is the likely hood that hash_map/hash_set will get added to the ANSI C++ standards.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.