Solved

Example code for hash_map

Posted on 2001-06-15
18
2,242 Views
Last Modified: 2012-08-13
I'm looking for example code that uses hash_map or any other hash template class.
I'm also looking for a good example in which using hash_map makes a big difference versus using std::map.

I don't have any experience using hash code of any kind, and I'm trying to figure out if it is worth brushing up my skills in this area.
It seems to me that for the most part, the std::map class can do the job good enough.

Any and all input would be appreciated.
0
Comment
Question by:Axter
  • 8
  • 7
  • 3
18 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 6196568
0
 
LVL 30

Author Comment

by:Axter
ID: 6196626
Thanks for the example jkr, but I already had that link.

I'm really looking for an example where I can see a clear difference between hash_map and std::map.
The example in that link could just as easily used std::map.
0
 
LVL 86

Expert Comment

by:jkr
ID: 6196637
>>I'm really looking for an example where I can see a
>>clear difference between hash_map and std::map

The 'clear difference' is that the mapping algorithm is/should be faster. I don't have too much experience in that area either, but the idea of a hashing algorithm is to 'calculate' the target from the input.

>>The example in that link could just as easily used
>>std::map

That's AFAIK the advantage of that type of map - it's so transparent that you can't see any difference...
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196639
As long as the hash function is reasonable, a hash table is more efficient (i.e. faster insertions and lookups) for situations where there are large numbers of entries (e.g. > 1000) and where the compare-time per element is relatively large (e.g. string compares vs. integer compares). There is no overhead for balancing trees or for worse-case behavior on unbalanced trees, plus less overhead for element-to-element comparisons (as must be done in traversing trees). For smaller sets of elements, a hash-map may not pay any dividends over the standard black/red-tree maps (in much the same way as, if you only have 4 or 5 elements, you may as well keep an array and do a linear search instead of creating a map).

The major drawback with hash tables is that you cannot get an ordered set of elements back out of one - in other words, no forward or reverse iterator to go from begin() to end(). In many cases, e.g. symbol tables, this isn't a problem (unless, of course, you want to print out a list of symbols, in alphabetical order, at the end).

As with any other programming tools, select the appropriate one for the specific job at hand. It wouldn't hurt to brush up on the hash maps. Knowing and using only a single type of container is like having only one medium-size philips screwdriver and trying to use it to repair eyeglasses and pry 18-inch bolts out of battleships.

- jack
0
 
LVL 30

Author Comment

by:Axter
ID: 6196666
So am I correct in saying that the way you use the hash_map class is no difference then the way you use std::map???
And the only difference is in how fast one-vs-the-other for a particular requirement.

I just notice that the hash_map does not have its data sorted, where as the std::map has its data sorted.
0
 
LVL 86

Expert Comment

by:jkr
ID: 6196677
>>So am I correct in saying that the way you use the
>>hash_map class is no difference then the way you
>>use std::map???

AFAIK - yes. Just the mapping algorithm is different...
0
 
LVL 30

Author Comment

by:Axter
ID: 6196681
I've seen you post this before, but never bothered to ask.
What is AFAIK?
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196685
That's correct - you use a hash_map just as you would a standard map, you just can't use iterators. The only cavaet is that there are a limited number of types for which hash functions have been defined; if you want to use other types (e.g. unicode strings), then you need to provide your own hash function in the template definition. Hash functions are used to create a simple index from an input; e.g. turning the string "Don't Panic - I have a towel" into, say, 42. Hash functions should be relatively quick (e.g. not take as much time to calculate as several comparisons would take) and generate fairly "random" outputs from inputs so as to reduce the number of collisions.
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196686
AFAIK = As far as I know
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 3

Expert Comment

by:JackThornton
ID: 6196689
At least, that's what it means as far as I know   ;-)
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196690
At least, that's what it means as far as I know   ;-)
0
 
LVL 30

Author Comment

by:Axter
ID: 6196702
>>AFAIK = As far as I know

IMHO, I think before you know it, the internet is going to have so many acronyms & symbols, that a newbie is going to need a special dictionary in order to carry out a normal conversation. :- ) :-0 ;-}
0
 
LVL 30

Author Comment

by:Axter
ID: 6196708
If you don't mind jkr, I'll post a seperate 50pt question for you, and award this one to JackThornton.
0
 
LVL 30

Author Comment

by:Axter
ID: 6196712
JackThornton,
>>generate fairly "random" outputs from inputs so as to
>>reduce the number of collisions.
This part I don't understand.  I know what the collision problems are with hash, but I don't understand how random outputs would reduce the number of collisions.
0
 
LVL 30

Author Comment

by:Axter
ID: 6196717
Wouldn't the goal of a hash_map be to produce an evenly spaced set of keys?
I would think that random outputs would reduce the ability to produce evenly spaced keys.
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196733
Well, "random" was a poor choice of words. Obviously, the function must map a particular input uniquely to a particular output, or the whole thing would fail. What I meant to say was that the function should be able to generate outputs that spread evenly among the desired index range (i.e. size of hash table) for the types of inputs provided. You don't want functions that tend to "cluster" around a small number of values over the range of expected inputs. For example, if you were to hash a string using only the value of the first character, it would be a very fast function, but you would only generate (most likely fewer than) 96 index values, so you would get lots and lots of collisions. Alternately, you could build a CRC from the entire string, which would probably give a good spread of values, but be computationally expensive enough to ruin the benefits of hashing. Both of these are extreme examples of poorly chosen algorithms - it's probably pretty simple to come up with reasonable algorithms for any input type, it just means you have to think about the problem a little.
0
 
LVL 3

Accepted Solution

by:
JackThornton earned 100 total points
ID: 6196757
Actually the goal isn't to produce "evenly spaced" keys, the goal is to reduce collisions by choice of keys (and by size of table vs. number of inputs). For example, if you have a table that is some prime number near 4000 entries large, and you only put 1000 items into it, you will have less probability of collisions than if you have a table that size and put 3500 items into it. With a more densly populated table, the quality of the hash generator becomes more important.

It just occured to me that "evenly spaced" and "random" are probably both attempts to express the same thought, for which I can't seem to come up with a good word. Maybe more internet acronyms would help ;-)

Also note that there are generally two different kinds of hash schemes - closed and open. In closed schemes, if there is a collision, you march ahead in the table and find an open spot. This means that (a) you cannot have more inputs than the size of the table (hence closed), and (b) you degenerate into doing key compares as you march through the table. In open schemes, there is the possibility for a sub-container at every node (e.g. a list, tree, array, whatever) so that collisions generate multiple entries at that node. This removes the limit to the number of items stored, but, again, you have to go back to doing key compares to resolve the collision.

Despite the complexity, hash tables do, in practice, operate much faster than balanced trees on appropriate sized tables. Consider the case of 1024 items - with a balanced tree, you will end up doing 10 compares; in a hash table (if you're lucky), you might get your "hit" with a hash calculation and a single indexed lookup. Conversely, the worse case goes way up (e.g. your hash algorithm, for some reason, puts all of your items into two slots, and the sub-container is linear, so you do 512 key compares to find the hit).

I seem to be babbling now, so I'll sign this one off.
0
 
LVL 30

Author Comment

by:Axter
ID: 6196907
>>I seem to be babbling now, so I'll sign this one off.

When the info is good, it's not babbling.

Thanks, this clear up somethings for me.  

I wounder what is the likely hood that hash_map/hash_set will get added to the ANSI C++ standards.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

In days of old, returning something by value from a function in C++ was necessarily avoided because it would, invariably, involve one or even two copies of the object being created and potentially costly calls to a copy-constructor and destructor. A…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now