Solved

Example code for hash_map

Posted on 2001-06-15
18
2,255 Views
Last Modified: 2012-08-13
I'm looking for example code that uses hash_map or any other hash template class.
I'm also looking for a good example in which using hash_map makes a big difference versus using std::map.

I don't have any experience using hash code of any kind, and I'm trying to figure out if it is worth brushing up my skills in this area.
It seems to me that for the most part, the std::map class can do the job good enough.

Any and all input would be appreciated.
0
Comment
Question by:Axter
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 7
  • 3
18 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 6196568
0
 
LVL 30

Author Comment

by:Axter
ID: 6196626
Thanks for the example jkr, but I already had that link.

I'm really looking for an example where I can see a clear difference between hash_map and std::map.
The example in that link could just as easily used std::map.
0
 
LVL 86

Expert Comment

by:jkr
ID: 6196637
>>I'm really looking for an example where I can see a
>>clear difference between hash_map and std::map

The 'clear difference' is that the mapping algorithm is/should be faster. I don't have too much experience in that area either, but the idea of a hashing algorithm is to 'calculate' the target from the input.

>>The example in that link could just as easily used
>>std::map

That's AFAIK the advantage of that type of map - it's so transparent that you can't see any difference...
0
[Live Webinar] The Cloud Skills Gap

As Cloud technologies come of age, business leaders grapple with the impact it has on their team's skills and the gap associated with the use of a cloud platform.

Join experts from 451 Research and Concerto Cloud Services on July 27th where we will examine fact and fiction.

 
LVL 3

Expert Comment

by:JackThornton
ID: 6196639
As long as the hash function is reasonable, a hash table is more efficient (i.e. faster insertions and lookups) for situations where there are large numbers of entries (e.g. > 1000) and where the compare-time per element is relatively large (e.g. string compares vs. integer compares). There is no overhead for balancing trees or for worse-case behavior on unbalanced trees, plus less overhead for element-to-element comparisons (as must be done in traversing trees). For smaller sets of elements, a hash-map may not pay any dividends over the standard black/red-tree maps (in much the same way as, if you only have 4 or 5 elements, you may as well keep an array and do a linear search instead of creating a map).

The major drawback with hash tables is that you cannot get an ordered set of elements back out of one - in other words, no forward or reverse iterator to go from begin() to end(). In many cases, e.g. symbol tables, this isn't a problem (unless, of course, you want to print out a list of symbols, in alphabetical order, at the end).

As with any other programming tools, select the appropriate one for the specific job at hand. It wouldn't hurt to brush up on the hash maps. Knowing and using only a single type of container is like having only one medium-size philips screwdriver and trying to use it to repair eyeglasses and pry 18-inch bolts out of battleships.

- jack
0
 
LVL 30

Author Comment

by:Axter
ID: 6196666
So am I correct in saying that the way you use the hash_map class is no difference then the way you use std::map???
And the only difference is in how fast one-vs-the-other for a particular requirement.

I just notice that the hash_map does not have its data sorted, where as the std::map has its data sorted.
0
 
LVL 86

Expert Comment

by:jkr
ID: 6196677
>>So am I correct in saying that the way you use the
>>hash_map class is no difference then the way you
>>use std::map???

AFAIK - yes. Just the mapping algorithm is different...
0
 
LVL 30

Author Comment

by:Axter
ID: 6196681
I've seen you post this before, but never bothered to ask.
What is AFAIK?
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196685
That's correct - you use a hash_map just as you would a standard map, you just can't use iterators. The only cavaet is that there are a limited number of types for which hash functions have been defined; if you want to use other types (e.g. unicode strings), then you need to provide your own hash function in the template definition. Hash functions are used to create a simple index from an input; e.g. turning the string "Don't Panic - I have a towel" into, say, 42. Hash functions should be relatively quick (e.g. not take as much time to calculate as several comparisons would take) and generate fairly "random" outputs from inputs so as to reduce the number of collisions.
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196686
AFAIK = As far as I know
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196689
At least, that's what it means as far as I know   ;-)
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196690
At least, that's what it means as far as I know   ;-)
0
 
LVL 30

Author Comment

by:Axter
ID: 6196702
>>AFAIK = As far as I know

IMHO, I think before you know it, the internet is going to have so many acronyms & symbols, that a newbie is going to need a special dictionary in order to carry out a normal conversation. :- ) :-0 ;-}
0
 
LVL 30

Author Comment

by:Axter
ID: 6196708
If you don't mind jkr, I'll post a seperate 50pt question for you, and award this one to JackThornton.
0
 
LVL 30

Author Comment

by:Axter
ID: 6196712
JackThornton,
>>generate fairly "random" outputs from inputs so as to
>>reduce the number of collisions.
This part I don't understand.  I know what the collision problems are with hash, but I don't understand how random outputs would reduce the number of collisions.
0
 
LVL 30

Author Comment

by:Axter
ID: 6196717
Wouldn't the goal of a hash_map be to produce an evenly spaced set of keys?
I would think that random outputs would reduce the ability to produce evenly spaced keys.
0
 
LVL 3

Expert Comment

by:JackThornton
ID: 6196733
Well, "random" was a poor choice of words. Obviously, the function must map a particular input uniquely to a particular output, or the whole thing would fail. What I meant to say was that the function should be able to generate outputs that spread evenly among the desired index range (i.e. size of hash table) for the types of inputs provided. You don't want functions that tend to "cluster" around a small number of values over the range of expected inputs. For example, if you were to hash a string using only the value of the first character, it would be a very fast function, but you would only generate (most likely fewer than) 96 index values, so you would get lots and lots of collisions. Alternately, you could build a CRC from the entire string, which would probably give a good spread of values, but be computationally expensive enough to ruin the benefits of hashing. Both of these are extreme examples of poorly chosen algorithms - it's probably pretty simple to come up with reasonable algorithms for any input type, it just means you have to think about the problem a little.
0
 
LVL 3

Accepted Solution

by:
JackThornton earned 100 total points
ID: 6196757
Actually the goal isn't to produce "evenly spaced" keys, the goal is to reduce collisions by choice of keys (and by size of table vs. number of inputs). For example, if you have a table that is some prime number near 4000 entries large, and you only put 1000 items into it, you will have less probability of collisions than if you have a table that size and put 3500 items into it. With a more densly populated table, the quality of the hash generator becomes more important.

It just occured to me that "evenly spaced" and "random" are probably both attempts to express the same thought, for which I can't seem to come up with a good word. Maybe more internet acronyms would help ;-)

Also note that there are generally two different kinds of hash schemes - closed and open. In closed schemes, if there is a collision, you march ahead in the table and find an open spot. This means that (a) you cannot have more inputs than the size of the table (hence closed), and (b) you degenerate into doing key compares as you march through the table. In open schemes, there is the possibility for a sub-container at every node (e.g. a list, tree, array, whatever) so that collisions generate multiple entries at that node. This removes the limit to the number of items stored, but, again, you have to go back to doing key compares to resolve the collision.

Despite the complexity, hash tables do, in practice, operate much faster than balanced trees on appropriate sized tables. Consider the case of 1024 items - with a balanced tree, you will end up doing 10 compares; in a hash table (if you're lucky), you might get your "hit" with a hash calculation and a single indexed lookup. Conversely, the worse case goes way up (e.g. your hash algorithm, for some reason, puts all of your items into two slots, and the sub-container is linear, so you do 512 key compares to find the hit).

I seem to be babbling now, so I'll sign this one off.
0
 
LVL 30

Author Comment

by:Axter
ID: 6196907
>>I seem to be babbling now, so I'll sign this one off.

When the info is good, it's not babbling.

Thanks, this clear up somethings for me.  

I wounder what is the likely hood that hash_map/hash_set will get added to the ANSI C++ standards.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What is C++ STL?: STL stands for Standard Template Library and is a part of standard C++ libraries. It contains many useful data structures (containers) and algorithms, which can spare you a lot of the time. Today we will look at the STL Vector. …
  Included as part of the C++ Standard Template Library (STL) is a collection of generic containers. Each of these containers serves a different purpose and has different pros and cons. It is often difficult to decide which container to use and …
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question