Solved

unordered map in c++

Posted on 2014-02-28
7
490 Views
Last Modified: 2014-03-06
I have a following structure in my design

class DummyClass {
public:
	uint64 _id;
	uint32 _ value;
};

struct DummyClassHash {
	size_t operator() (const DummyClass &dummy_obj) const
	{return(((uint32)dummy_obj._id) ^ (dummy_obj._id >> 32) ^ (dummy_obj._ value));}
};
struct DummyClassEquality {
	bool operator() (const DummyClass &l_obj, const DummyClass &r_obj) const
	{return (l_obj._id == r_obj._id) && (l_obj._ value == r_obj._ value);}
};

typedef std::tr1::unordered_map<DummyClass, uint64,  DummyClassHash, DummyClassEquality> DummyClassMap;

Open in new window


I have few questions what magic C++ runtime does when inserting/searching an element in this hash map

1) If I understand correctly, DummyClassHash will be used during insert() operation. I know there is no guarantee that there will be no collision with my hash. Internally unordered_map is nothing but an array. What magic compiler does once it finds the hash value from my DummyClassHash() function, how does it point to an index or aray

2) DummyClassEquality() will be exercised when I use find() operation. During find(), the runtme will first use DummyClassHash() to point to the index in the array and then use find() to get the exact match. Correct?

Thanks
0
Comment
Question by:perlperl
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
7 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 39896005
>>What magic compiler does once it finds the hash value from my DummyClassHash()
>>function, how does it point to an index or aray

>>During find(), the runtme will first use DummyClassHash() to point to the index in the
>>array and then use find() to get the exact match. Correct?

cppreference.com describes that nicely:

Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its key. This allows fast access to individual elements, since once hash is computed, it refers to the exact bucket the element is placed into.

More details at Wikipedia: http://en.wikipedia.org/wiki/Unordered_associative_containers_%28C%2B%2B%29 (also regarding the hash function)
0
 

Author Comment

by:perlperl
ID: 39896332
Thanks for the link. I was more curious about how buckets are internally stored/implemented

I am using X-ORing of class members in hash function. Does that more uniformly distribute and makes stronger hash function compared to just ORing the members?
0
 
LVL 86

Accepted Solution

by:
jkr earned 250 total points
ID: 39896365
Just noticed that I forgot the link for the above qoute, here it is: http://en.cppreference.com/w/cpp/container/unordered_map

You can see the implementation for the bucket part yourself in detail here: http://code.woboq.org/userspace/include/c++/4.8.2/bits/unordered_map.h.html

It basically is a hash table (http://en.wikipedia.org/wiki/Hash_table)
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 86

Expert Comment

by:jkr
ID: 39896378
A little more here, BTW: http://www.drdobbs.com/stl-and-tr1-part-iii/184402066 ("STL and TR1: Part III")
0
 

Author Comment

by:perlperl
ID: 39896389
Thx jkr. I'll look into it later today

Any comments about my method of XORing way of hashing instead of OR
0
 
LVL 86

Expert Comment

by:jkr
ID: 39896415
Not really. Unless you have a perfect hash function, you'll always have collisions. Pick your poison: http://en.wikipedia.org/wiki/List_of_hash_functions ;o)
0
 
LVL 34

Assisted Solution

by:sarabande
sarabande earned 250 total points
ID: 39900841
((uint32)dummy_obj._id) ^ (dummy_obj._id >> 32) ^ (dummy_obj._ value)
if your 64-bit ids are well distributed in the range 0 -  (2^64) -1) and the value also adds some bits randomly from 0 to 31 and not always the same ones, it could be a good hash. but if the id's are mostly 32-bit numbers then the 2nd term is always 0 (what would make the first xor result being equal to the id). if the value has only a low variety or has only low numbers, it would not add much to the final hash. that is good, if the id already is a good base for the hash itself or is bad if the value and the id have some dependencies or they build some clusters which give poor xor results.

note, if you use the value for the key, the entries never could be changed.

If I understand correctly, DummyClassHash will be used during insert() operation. I know there is no guarantee that there will be no collision with my hash. Internally unordered_map is nothing but an array. What magic compiler does once it finds the hash value from my DummyClassHash() function, how does it point to an index or aray
if you insert a DummyClass the unordered_map gets a hash_key from DummyClassHash by using operator() function. it then, "rescales" the 32-bit number returned to point to a slot of the internal array. if this slot is already used but different hashkey, there are different methods how a hashmap could be organized.

one is that the slot points to a bucket (normally a fixed-sized buffer) which contains pairs of hashkey+storage-pointer. the list was iterated until the hashkey searched was found. if that is the case, the new entry was added to the storage already associated with the key. if not found, a new pair of key+storage-pointers was created and added to the list. the pairs stored in the bucket normally were ordered by hashkey to make the look-up faster. a full bucket would point to next bucket.    

another method is that if the calculated slot was already used, simply the next free slot was searched in the internal list and used instead. this approach would be very fast if most hashkeys have a unique slot. it is extremely slow for poor hashkeys which rarely have a first-time hit and where the free slots are also badly distributed.

generally you should test the quality of your hash by calculating  the keys for a significant part (or all) of your keys. then check how the keys were distributed. you also could measure the time until the unordered map was filled and compare it to the time you need to add the same entries to a map where the 64-bit id is the key and the 32-bit value is the data. if the keys are not unique, you would use a multi-map.

generally a map has the advantage that it is ordered. while the insert could be slower, the times for look-up can be measured only if you do thousands of lookups.

Sara
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
I could not build boost code, 10 114
One named event, multiple event handlers 2 53
Can Live bindings change TGrid Cell Colour ? 1 58
Coding for the first time 9 95
  Included as part of the C++ Standard Template Library (STL) is a collection of generic containers. Each of these containers serves a different purpose and has different pros and cons. It is often difficult to decide which container to use and …
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question