unordered map in c++

I have a following structure in my design

class DummyClass {
public:
	uint64 _id;
	uint32 _ value;
};

struct DummyClassHash {
	size_t operator() (const DummyClass &dummy_obj) const
	{return(((uint32)dummy_obj._id) ^ (dummy_obj._id >> 32) ^ (dummy_obj._ value));}
};
struct DummyClassEquality {
	bool operator() (const DummyClass &l_obj, const DummyClass &r_obj) const
	{return (l_obj._id == r_obj._id) && (l_obj._ value == r_obj._ value);}
};

typedef std::tr1::unordered_map<DummyClass, uint64,  DummyClassHash, DummyClassEquality> DummyClassMap;

Open in new window


I have few questions what magic C++ runtime does when inserting/searching an element in this hash map

1) If I understand correctly, DummyClassHash will be used during insert() operation. I know there is no guarantee that there will be no collision with my hash. Internally unordered_map is nothing but an array. What magic compiler does once it finds the hash value from my DummyClassHash() function, how does it point to an index or aray

2) DummyClassEquality() will be exercised when I use find() operation. During find(), the runtme will first use DummyClassHash() to point to the index in the array and then use find() to get the exact match. Correct?

Thanks
perlperlAsked:
Who is Participating?
 
jkrConnect With a Mentor Commented:
Just noticed that I forgot the link for the above qoute, here it is: http://en.cppreference.com/w/cpp/container/unordered_map

You can see the implementation for the bucket part yourself in detail here: http://code.woboq.org/userspace/include/c++/4.8.2/bits/unordered_map.h.html

It basically is a hash table (http://en.wikipedia.org/wiki/Hash_table)
0
 
jkrCommented:
>>What magic compiler does once it finds the hash value from my DummyClassHash()
>>function, how does it point to an index or aray

>>During find(), the runtme will first use DummyClassHash() to point to the index in the
>>array and then use find() to get the exact match. Correct?

cppreference.com describes that nicely:

Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its key. This allows fast access to individual elements, since once hash is computed, it refers to the exact bucket the element is placed into.

More details at Wikipedia: http://en.wikipedia.org/wiki/Unordered_associative_containers_%28C%2B%2B%29 (also regarding the hash function)
0
 
perlperlAuthor Commented:
Thanks for the link. I was more curious about how buckets are internally stored/implemented

I am using X-ORing of class members in hash function. Does that more uniformly distribute and makes stronger hash function compared to just ORing the members?
0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
jkrCommented:
A little more here, BTW: http://www.drdobbs.com/stl-and-tr1-part-iii/184402066 ("STL and TR1: Part III")
0
 
perlperlAuthor Commented:
Thx jkr. I'll look into it later today

Any comments about my method of XORing way of hashing instead of OR
0
 
jkrCommented:
Not really. Unless you have a perfect hash function, you'll always have collisions. Pick your poison: http://en.wikipedia.org/wiki/List_of_hash_functions ;o)
0
 
sarabandeConnect With a Mentor Commented:
((uint32)dummy_obj._id) ^ (dummy_obj._id >> 32) ^ (dummy_obj._ value)
if your 64-bit ids are well distributed in the range 0 -  (2^64) -1) and the value also adds some bits randomly from 0 to 31 and not always the same ones, it could be a good hash. but if the id's are mostly 32-bit numbers then the 2nd term is always 0 (what would make the first xor result being equal to the id). if the value has only a low variety or has only low numbers, it would not add much to the final hash. that is good, if the id already is a good base for the hash itself or is bad if the value and the id have some dependencies or they build some clusters which give poor xor results.

note, if you use the value for the key, the entries never could be changed.

If I understand correctly, DummyClassHash will be used during insert() operation. I know there is no guarantee that there will be no collision with my hash. Internally unordered_map is nothing but an array. What magic compiler does once it finds the hash value from my DummyClassHash() function, how does it point to an index or aray
if you insert a DummyClass the unordered_map gets a hash_key from DummyClassHash by using operator() function. it then, "rescales" the 32-bit number returned to point to a slot of the internal array. if this slot is already used but different hashkey, there are different methods how a hashmap could be organized.

one is that the slot points to a bucket (normally a fixed-sized buffer) which contains pairs of hashkey+storage-pointer. the list was iterated until the hashkey searched was found. if that is the case, the new entry was added to the storage already associated with the key. if not found, a new pair of key+storage-pointers was created and added to the list. the pairs stored in the bucket normally were ordered by hashkey to make the look-up faster. a full bucket would point to next bucket.    

another method is that if the calculated slot was already used, simply the next free slot was searched in the internal list and used instead. this approach would be very fast if most hashkeys have a unique slot. it is extremely slow for poor hashkeys which rarely have a first-time hit and where the free slots are also badly distributed.

generally you should test the quality of your hash by calculating  the keys for a significant part (or all) of your keys. then check how the keys were distributed. you also could measure the time until the unordered map was filled and compare it to the time you need to add the same entries to a map where the 64-bit id is the key and the 32-bit value is the data. if the keys are not unique, you would use a multi-map.

generally a map has the advantage that it is ordered. while the insert could be slower, the times for look-up can be measured only if you do thousands of lookups.

Sara
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.