Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Example code for hash_map

Posted on 2001-06-15
18
Medium Priority
?
2,260 Views
Last Modified: 2012-08-13
I'm looking for example code that uses hash_map or any other hash template class.
I'm also looking for a good example in which using hash_map makes a big difference versus using std::map.

I don't have any experience using hash code of any kind, and I'm trying to figure out if it is worth brushing up my skills in this area.
It seems to me that for the most part, the std::map class can do the job good enough.

Any and all input would be appreciated.
0
Comment
Question by:Axter
  • 8
  • 7
  • 3
18 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 6196568
0
 
LVL 30

Author Comment

by:Axter
ID: 6196626
Thanks for the example jkr, but I already had that link.

I'm really looking for an example where I can see a clear difference between hash_map and std::map.
The example in that link could just as easily used std::map.
0
 
LVL 86

Expert Comment

by:jkr
ID: 6196637
>>I'm really looking for an example where I can see a
>>clear difference between hash_map and std::map

The 'clear difference' is that the mapping algorithm is/should be faster. I don't have too much experience in that area either, but the idea of a hashing algorithm is to 'calculate' the target from the input.

>>The example in that link could just as easily used
>>std::map

That's AFAIK the advantage of that type of map - it's so transparent that you can't see any difference...
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 3

Expert Comment

by:JackThornton
ID: 6196639
As long as the hash function is reasonable, a hash table is more efficient (i.e. faster insertions and lookups) for situations where there are large numbers of entries (e.g. > 1000) and where the compare-time per element is relatively large (e.g. string compares vs. integer compares). There is no overhead for balancing trees or for worse-case behavior on unbalanced trees, plus less overhead for element-to-element comparisons (as must be done in traversing trees). For smaller sets of elements, a hash-map may not pay any dividends over the standard black/red-tree maps (in much the same way as, if you only have 4 or 5 elements, you may as well keep an array and do a linear search instead of creating a map).

The major drawback with hash tables is that you cannot get an ordered set of elements back out of one - in other words, no forward or reverse iterator to go from begin() to end(). In many cases, e.g. symbol tables, this isn't a problem (unless, of course, you want to print out a list of symbols, in alphabetical order, at the end).

As with any other programming tools, select the appropriate one for the specific job at hand. It wouldn't hurt to brush up on the hash maps. Knowing and using only a single type of container is like having only one medium-size philips screwdriver and trying to use it to repair eyeglasses and pry 18-inch bolts out of battleships.

- jack
0
 
LVL 30

Author Comment

by:Axter
ID: 6196666
So am I correct in saying that the way you use the hash_map class is no difference then the way you use std::map???
And the only difference is in how fast one-vs-the-other for a particular requirement.

I just notice that the hash_map does not have its data sorted, where as the std::map has its data sorted.
0
 
LVL 86

Expert Comment

by:jkr
ID: 6196677
>>So am I correct in saying that the way you use the
>>hash_map class is no difference then the way you
>>use std::map???

AFAIK - yes. Just the mapping algorithm is different...
0
 
LVL 30

Author Comment

by:Axter
ID: 6196681
I've seen you post this before, but never bothered to ask.
What is AFAIK?
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196685
That's correct - you use a hash_map just as you would a standard map, you just can't use iterators. The only cavaet is that there are a limited number of types for which hash functions have been defined; if you want to use other types (e.g. unicode strings), then you need to provide your own hash function in the template definition. Hash functions are used to create a simple index from an input; e.g. turning the string "Don't Panic - I have a towel" into, say, 42. Hash functions should be relatively quick (e.g. not take as much time to calculate as several comparisons would take) and generate fairly "random" outputs from inputs so as to reduce the number of collisions.
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196686
AFAIK = As far as I know
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196689
At least, that's what it means as far as I know   ;-)
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196690
At least, that's what it means as far as I know   ;-)
0
 
LVL 30

Author Comment

by:Axter
ID: 6196702
>>AFAIK = As far as I know

IMHO, I think before you know it, the internet is going to have so many acronyms & symbols, that a newbie is going to need a special dictionary in order to carry out a normal conversation. :- ) :-0 ;-}
0
 
LVL 30

Author Comment

by:Axter
ID: 6196708
If you don't mind jkr, I'll post a seperate 50pt question for you, and award this one to JackThornton.
0
 
LVL 30

Author Comment

by:Axter
ID: 6196712
JackThornton,
>>generate fairly "random" outputs from inputs so as to
>>reduce the number of collisions.
This part I don't understand.  I know what the collision problems are with hash, but I don't understand how random outputs would reduce the number of collisions.
0
 
LVL 30

Author Comment

by:Axter
ID: 6196717
Wouldn't the goal of a hash_map be to produce an evenly spaced set of keys?
I would think that random outputs would reduce the ability to produce evenly spaced keys.
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196733
Well, "random" was a poor choice of words. Obviously, the function must map a particular input uniquely to a particular output, or the whole thing would fail. What I meant to say was that the function should be able to generate outputs that spread evenly among the desired index range (i.e. size of hash table) for the types of inputs provided. You don't want functions that tend to "cluster" around a small number of values over the range of expected inputs. For example, if you were to hash a string using only the value of the first character, it would be a very fast function, but you would only generate (most likely fewer than) 96 index values, so you would get lots and lots of collisions. Alternately, you could build a CRC from the entire string, which would probably give a good spread of values, but be computationally expensive enough to ruin the benefits of hashing. Both of these are extreme examples of poorly chosen algorithms - it's probably pretty simple to come up with reasonable algorithms for any input type, it just means you have to think about the problem a little.
0
 
LVL 3

Accepted Solution

by:
JackThornton earned 400 total points
ID: 6196757
Actually the goal isn't to produce "evenly spaced" keys, the goal is to reduce collisions by choice of keys (and by size of table vs. number of inputs). For example, if you have a table that is some prime number near 4000 entries large, and you only put 1000 items into it, you will have less probability of collisions than if you have a table that size and put 3500 items into it. With a more densly populated table, the quality of the hash generator becomes more important.

It just occured to me that "evenly spaced" and "random" are probably both attempts to express the same thought, for which I can't seem to come up with a good word. Maybe more internet acronyms would help ;-)

Also note that there are generally two different kinds of hash schemes - closed and open. In closed schemes, if there is a collision, you march ahead in the table and find an open spot. This means that (a) you cannot have more inputs than the size of the table (hence closed), and (b) you degenerate into doing key compares as you march through the table. In open schemes, there is the possibility for a sub-container at every node (e.g. a list, tree, array, whatever) so that collisions generate multiple entries at that node. This removes the limit to the number of items stored, but, again, you have to go back to doing key compares to resolve the collision.

Despite the complexity, hash tables do, in practice, operate much faster than balanced trees on appropriate sized tables. Consider the case of 1024 items - with a balanced tree, you will end up doing 10 compares; in a hash table (if you're lucky), you might get your "hit" with a hash calculation and a single indexed lookup. Conversely, the worse case goes way up (e.g. your hash algorithm, for some reason, puts all of your items into two slots, and the sub-container is linear, so you do 512 key compares to find the hit).

I seem to be babbling now, so I'll sign this one off.
0
 
LVL 30

Author Comment

by:Axter
ID: 6196907
>>I seem to be babbling now, so I'll sign this one off.

When the info is good, it's not babbling.

Thanks, this clear up somethings for me.  

I wounder what is the likely hood that hash_map/hash_set will get added to the ANSI C++ standards.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.
Suggested Courses

927 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question