Solved

unordered map in c++

Posted on 2014-02-28
7
473 Views
Last Modified: 2014-03-06
I have a following structure in my design

class DummyClass {
public:
	uint64 _id;
	uint32 _ value;
};

struct DummyClassHash {
	size_t operator() (const DummyClass &dummy_obj) const
	{return(((uint32)dummy_obj._id) ^ (dummy_obj._id >> 32) ^ (dummy_obj._ value));}
};
struct DummyClassEquality {
	bool operator() (const DummyClass &l_obj, const DummyClass &r_obj) const
	{return (l_obj._id == r_obj._id) && (l_obj._ value == r_obj._ value);}
};

typedef std::tr1::unordered_map<DummyClass, uint64,  DummyClassHash, DummyClassEquality> DummyClassMap;

Open in new window


I have few questions what magic C++ runtime does when inserting/searching an element in this hash map

1) If I understand correctly, DummyClassHash will be used during insert() operation. I know there is no guarantee that there will be no collision with my hash. Internally unordered_map is nothing but an array. What magic compiler does once it finds the hash value from my DummyClassHash() function, how does it point to an index or aray

2) DummyClassEquality() will be exercised when I use find() operation. During find(), the runtme will first use DummyClassHash() to point to the index in the array and then use find() to get the exact match. Correct?

Thanks
0
Comment
Question by:perlperl
  • 4
  • 2
7 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 39896005
>>What magic compiler does once it finds the hash value from my DummyClassHash()
>>function, how does it point to an index or aray

>>During find(), the runtme will first use DummyClassHash() to point to the index in the
>>array and then use find() to get the exact match. Correct?

cppreference.com describes that nicely:

Internally, the elements are not sorted in any particular order, but organized into buckets. Which bucket an element is placed into depends entirely on the hash of its key. This allows fast access to individual elements, since once hash is computed, it refers to the exact bucket the element is placed into.

More details at Wikipedia: http://en.wikipedia.org/wiki/Unordered_associative_containers_%28C%2B%2B%29 (also regarding the hash function)
0
 

Author Comment

by:perlperl
ID: 39896332
Thanks for the link. I was more curious about how buckets are internally stored/implemented

I am using X-ORing of class members in hash function. Does that more uniformly distribute and makes stronger hash function compared to just ORing the members?
0
 
LVL 86

Accepted Solution

by:
jkr earned 250 total points
ID: 39896365
Just noticed that I forgot the link for the above qoute, here it is: http://en.cppreference.com/w/cpp/container/unordered_map

You can see the implementation for the bucket part yourself in detail here: http://code.woboq.org/userspace/include/c++/4.8.2/bits/unordered_map.h.html

It basically is a hash table (http://en.wikipedia.org/wiki/Hash_table)
0
Courses: Start Training Online With Pros, Today

Brush up on the basics or master the advanced techniques required to earn essential industry certifications, with Courses. Enroll in a course and start learning today. Training topics range from Android App Dev to the Xen Virtualization Platform.

 
LVL 86

Expert Comment

by:jkr
ID: 39896378
A little more here, BTW: http://www.drdobbs.com/stl-and-tr1-part-iii/184402066 ("STL and TR1: Part III")
0
 

Author Comment

by:perlperl
ID: 39896389
Thx jkr. I'll look into it later today

Any comments about my method of XORing way of hashing instead of OR
0
 
LVL 86

Expert Comment

by:jkr
ID: 39896415
Not really. Unless you have a perfect hash function, you'll always have collisions. Pick your poison: http://en.wikipedia.org/wiki/List_of_hash_functions ;o)
0
 
LVL 33

Assisted Solution

by:sarabande
sarabande earned 250 total points
ID: 39900841
((uint32)dummy_obj._id) ^ (dummy_obj._id >> 32) ^ (dummy_obj._ value)
if your 64-bit ids are well distributed in the range 0 -  (2^64) -1) and the value also adds some bits randomly from 0 to 31 and not always the same ones, it could be a good hash. but if the id's are mostly 32-bit numbers then the 2nd term is always 0 (what would make the first xor result being equal to the id). if the value has only a low variety or has only low numbers, it would not add much to the final hash. that is good, if the id already is a good base for the hash itself or is bad if the value and the id have some dependencies or they build some clusters which give poor xor results.

note, if you use the value for the key, the entries never could be changed.

If I understand correctly, DummyClassHash will be used during insert() operation. I know there is no guarantee that there will be no collision with my hash. Internally unordered_map is nothing but an array. What magic compiler does once it finds the hash value from my DummyClassHash() function, how does it point to an index or aray
if you insert a DummyClass the unordered_map gets a hash_key from DummyClassHash by using operator() function. it then, "rescales" the 32-bit number returned to point to a slot of the internal array. if this slot is already used but different hashkey, there are different methods how a hashmap could be organized.

one is that the slot points to a bucket (normally a fixed-sized buffer) which contains pairs of hashkey+storage-pointer. the list was iterated until the hashkey searched was found. if that is the case, the new entry was added to the storage already associated with the key. if not found, a new pair of key+storage-pointers was created and added to the list. the pairs stored in the bucket normally were ordered by hashkey to make the look-up faster. a full bucket would point to next bucket.    

another method is that if the calculated slot was already used, simply the next free slot was searched in the internal list and used instead. this approach would be very fast if most hashkeys have a unique slot. it is extremely slow for poor hashkeys which rarely have a first-time hit and where the free slots are also badly distributed.

generally you should test the quality of your hash by calculating  the keys for a significant part (or all) of your keys. then check how the keys were distributed. you also could measure the time until the unordered map was filled and compare it to the time you need to add the same entries to a map where the 64-bit id is the key and the 32-bit value is the data. if the keys are not unique, you would use a multi-map.

generally a map has the advantage that it is ordered. while the insert could be slower, the times for look-up can be measured only if you do thousands of lookups.

Sara
0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
In days of old, returning something by value from a function in C++ was necessarily avoided because it would, invariably, involve one or even two copies of the object being created and potentially costly calls to a copy-constructor and destructor. A…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question