Solved

unordered map in c++

Posted on 2014-02-28
7
493 Views
Last Modified: 2014-03-06
I have a following structure in my design

class DummyClass {
public:
	uint64 _id;
	uint32 _ value;
};

struct DummyClassHash {
	size_t operator() (const DummyClass &dummy_obj) const
	{return(((uint32)dummy_obj._id) ^ (dummy_obj._id >> 32) ^ (dummy_obj._ value));}
};
struct DummyClassEquality {
	bool operator() (const DummyClass &l_obj, const DummyClass &r_obj) const
	{return (l_obj._id == r_obj._id) && (l_obj._ value == r_obj._ value);}
};

typedef std::tr1::unordered_map<DummyClass, uint64,  DummyClassHash, DummyClassEquality> DummyClassMap;

Open in new window


I have few questions what magic C++ runtime does when inserting/searching an element in this hash map

1) If I understand correctly, DummyClassHash will be used during insert() operation. I know there is no guarantee that there will be no collision with my hash. Internally unordered_map is nothing but an array. What magic compiler does once it finds the hash value from my DummyClassHash() function, how does it point to an index or aray

2) DummyClassEquality() will be exercised when I use find() operation. During find(), the runtme will first use DummyClassHash() to point to the index in the array and then use find() to get the exact match. Correct?

Thanks
0
Comment
Question by:perlperl
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
7 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 39896005
>>What magic compiler does once it finds the hash value from my DummyClassHash()
>>function, how does it point to an index or aray

>>During find(), the runtme will first use DummyClassHash() to point to the index in the
>>array and then use find() to get the exact match. Correct?

cppreference.com describes that nicely:

Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its key. This allows fast access to individual elements, since once hash is computed, it refers to the exact bucket the element is placed into.

More details at Wikipedia: http://en.wikipedia.org/wiki/Unordered_associative_containers_%28C%2B%2B%29 (also regarding the hash function)
0
 

Author Comment

by:perlperl
ID: 39896332
Thanks for the link. I was more curious about how buckets are internally stored/implemented

I am using X-ORing of class members in hash function. Does that more uniformly distribute and makes stronger hash function compared to just ORing the members?
0
 
LVL 86

Accepted Solution

by:
jkr earned 250 total points
ID: 39896365
Just noticed that I forgot the link for the above qoute, here it is: http://en.cppreference.com/w/cpp/container/unordered_map

You can see the implementation for the bucket part yourself in detail here: http://code.woboq.org/userspace/include/c++/4.8.2/bits/unordered_map.h.html

It basically is a hash table (http://en.wikipedia.org/wiki/Hash_table)
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 86

Expert Comment

by:jkr
ID: 39896378
A little more here, BTW: http://www.drdobbs.com/stl-and-tr1-part-iii/184402066 ("STL and TR1: Part III")
0
 

Author Comment

by:perlperl
ID: 39896389
Thx jkr. I'll look into it later today

Any comments about my method of XORing way of hashing instead of OR
0
 
LVL 86

Expert Comment

by:jkr
ID: 39896415
Not really. Unless you have a perfect hash function, you'll always have collisions. Pick your poison: http://en.wikipedia.org/wiki/List_of_hash_functions ;o)
0
 
LVL 34

Assisted Solution

by:sarabande
sarabande earned 250 total points
ID: 39900841
((uint32)dummy_obj._id) ^ (dummy_obj._id >> 32) ^ (dummy_obj._ value)
if your 64-bit ids are well distributed in the range 0 -  (2^64) -1) and the value also adds some bits randomly from 0 to 31 and not always the same ones, it could be a good hash. but if the id's are mostly 32-bit numbers then the 2nd term is always 0 (what would make the first xor result being equal to the id). if the value has only a low variety or has only low numbers, it would not add much to the final hash. that is good, if the id already is a good base for the hash itself or is bad if the value and the id have some dependencies or they build some clusters which give poor xor results.

note, if you use the value for the key, the entries never could be changed.

If I understand correctly, DummyClassHash will be used during insert() operation. I know there is no guarantee that there will be no collision with my hash. Internally unordered_map is nothing but an array. What magic compiler does once it finds the hash value from my DummyClassHash() function, how does it point to an index or aray
if you insert a DummyClass the unordered_map gets a hash_key from DummyClassHash by using operator() function. it then, "rescales" the 32-bit number returned to point to a slot of the internal array. if this slot is already used but different hashkey, there are different methods how a hashmap could be organized.

one is that the slot points to a bucket (normally a fixed-sized buffer) which contains pairs of hashkey+storage-pointer. the list was iterated until the hashkey searched was found. if that is the case, the new entry was added to the storage already associated with the key. if not found, a new pair of key+storage-pointers was created and added to the list. the pairs stored in the bucket normally were ordered by hashkey to make the look-up faster. a full bucket would point to next bucket.    

another method is that if the calculated slot was already used, simply the next free slot was searched in the internal list and used instead. this approach would be very fast if most hashkeys have a unique slot. it is extremely slow for poor hashkeys which rarely have a first-time hit and where the free slots are also badly distributed.

generally you should test the quality of your hash by calculating  the keys for a significant part (or all) of your keys. then check how the keys were distributed. you also could measure the time until the unordered map was filled and compare it to the time you need to add the same entries to a map where the 64-bit id is the key and the 32-bit value is the data. if the keys are not unique, you would use a multi-map.

generally a map has the advantage that it is ordered. while the insert could be slower, the times for look-up can be measured only if you do thousands of lookups.

Sara
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
This article will show you some of the more useful Standard Template Library (STL) algorithms through the use of working examples.  You will learn about how these algorithms fit into the STL architecture, how they work with STL containers, and why t…
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question