Solved

unordered map in c++

Posted on 2014-02-28
7
466 Views
Last Modified: 2014-03-06
I have a following structure in my design

class DummyClass {
public:
	uint64 _id;
	uint32 _ value;
};

struct DummyClassHash {
	size_t operator() (const DummyClass &dummy_obj) const
	{return(((uint32)dummy_obj._id) ^ (dummy_obj._id >> 32) ^ (dummy_obj._ value));}
};
struct DummyClassEquality {
	bool operator() (const DummyClass &l_obj, const DummyClass &r_obj) const
	{return (l_obj._id == r_obj._id) && (l_obj._ value == r_obj._ value);}
};

typedef std::tr1::unordered_map<DummyClass, uint64,  DummyClassHash, DummyClassEquality> DummyClassMap;

Open in new window


I have few questions what magic C++ runtime does when inserting/searching an element in this hash map

1) If I understand correctly, DummyClassHash will be used during insert() operation. I know there is no guarantee that there will be no collision with my hash. Internally unordered_map is nothing but an array. What magic compiler does once it finds the hash value from my DummyClassHash() function, how does it point to an index or aray

2) DummyClassEquality() will be exercised when I use find() operation. During find(), the runtme will first use DummyClassHash() to point to the index in the array and then use find() to get the exact match. Correct?

Thanks
0
Comment
Question by:perlperl
  • 4
  • 2
7 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 39896005
>>What magic compiler does once it finds the hash value from my DummyClassHash()
>>function, how does it point to an index or aray

>>During find(), the runtme will first use DummyClassHash() to point to the index in the
>>array and then use find() to get the exact match. Correct?

cppreference.com describes that nicely:

Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its key. This allows fast access to individual elements, since once hash is computed, it refers to the exact bucket the element is placed into.

More details at Wikipedia: http://en.wikipedia.org/wiki/Unordered_associative_containers_%28C%2B%2B%29 (also regarding the hash function)
0
 

Author Comment

by:perlperl
ID: 39896332
Thanks for the link. I was more curious about how buckets are internally stored/implemented

I am using X-ORing of class members in hash function. Does that more uniformly distribute and makes stronger hash function compared to just ORing the members?
0
 
LVL 86

Accepted Solution

by:
jkr earned 250 total points
ID: 39896365
Just noticed that I forgot the link for the above qoute, here it is: http://en.cppreference.com/w/cpp/container/unordered_map

You can see the implementation for the bucket part yourself in detail here: http://code.woboq.org/userspace/include/c++/4.8.2/bits/unordered_map.h.html

It basically is a hash table (http://en.wikipedia.org/wiki/Hash_table)
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 86

Expert Comment

by:jkr
ID: 39896378
A little more here, BTW: http://www.drdobbs.com/stl-and-tr1-part-iii/184402066 ("STL and TR1: Part III")
0
 

Author Comment

by:perlperl
ID: 39896389
Thx jkr. I'll look into it later today

Any comments about my method of XORing way of hashing instead of OR
0
 
LVL 86

Expert Comment

by:jkr
ID: 39896415
Not really. Unless you have a perfect hash function, you'll always have collisions. Pick your poison: http://en.wikipedia.org/wiki/List_of_hash_functions ;o)
0
 
LVL 33

Assisted Solution

by:sarabande
sarabande earned 250 total points
ID: 39900841
((uint32)dummy_obj._id) ^ (dummy_obj._id >> 32) ^ (dummy_obj._ value)
if your 64-bit ids are well distributed in the range 0 -  (2^64) -1) and the value also adds some bits randomly from 0 to 31 and not always the same ones, it could be a good hash. but if the id's are mostly 32-bit numbers then the 2nd term is always 0 (what would make the first xor result being equal to the id). if the value has only a low variety or has only low numbers, it would not add much to the final hash. that is good, if the id already is a good base for the hash itself or is bad if the value and the id have some dependencies or they build some clusters which give poor xor results.

note, if you use the value for the key, the entries never could be changed.

If I understand correctly, DummyClassHash will be used during insert() operation. I know there is no guarantee that there will be no collision with my hash. Internally unordered_map is nothing but an array. What magic compiler does once it finds the hash value from my DummyClassHash() function, how does it point to an index or aray
if you insert a DummyClass the unordered_map gets a hash_key from DummyClassHash by using operator() function. it then, "rescales" the 32-bit number returned to point to a slot of the internal array. if this slot is already used but different hashkey, there are different methods how a hashmap could be organized.

one is that the slot points to a bucket (normally a fixed-sized buffer) which contains pairs of hashkey+storage-pointer. the list was iterated until the hashkey searched was found. if that is the case, the new entry was added to the storage already associated with the key. if not found, a new pair of key+storage-pointers was created and added to the list. the pairs stored in the bucket normally were ordered by hashkey to make the look-up faster. a full bucket would point to next bucket.    

another method is that if the calculated slot was already used, simply the next free slot was searched in the internal list and used instead. this approach would be very fast if most hashkeys have a unique slot. it is extremely slow for poor hashkeys which rarely have a first-time hit and where the free slots are also badly distributed.

generally you should test the quality of your hash by calculating  the keys for a significant part (or all) of your keys. then check how the keys were distributed. you also could measure the time until the unordered map was filled and compare it to the time you need to add the same entries to a map where the 64-bit id is the key and the 32-bit value is the data. if the keys are not unique, you would use a multi-map.

generally a map has the advantage that it is ordered. while the insert could be slower, the times for look-up can be measured only if you do thousands of lookups.

Sara
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now