True, and I know that, but database is not an option in my case. But forgetting million, how would you deal with a large collection?
Main Topics
Browse All TopicsHello Experts!
I would like to reasonably quickly lookup objects by NON-UNIQUE values in a collection.
Naturally I tried at Hashtables and Dictionaries, but because they rely on key's uniqueness, I couldn't really use them.
&By the way, if you know an alternative way, I don't mind what "collection" to use. Any collection that would be quick for storing extra large collection (1 million objects), and looking them up. The type of each VALUE to lookup is long.
Hence the idea of custom Hashtable.. I thought of implementing a custom hashable solution which would in effect use chaining and not rehashing (http://en.wikipedia.org/w
I wanted to ask if you know of a good (fast!?) implementation of this, or if not, if you could look at my code and suggest how it can be improved/completed please?
If you think it's a good idea, how would you implement the rest?
It implements just one nonunique key lookup (but there may be more).
Thank you,
Dmitri
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
i will prefer a dictionary over hashtable because at least the overhead of boxing and unboxing will not occur
The Dictionary class has the same functionality as the Hashtable class. A Dictionary of a specific type (other than Object) has better performance than a Hashtable for value types because the elements of Hashtable are of type Object and, therefore, boxing and unboxing typically occur if storing or retrieving a value type. - this from the following link
http://msdn.microsoft.com/
I remembered that there was a specialized collection that does pretty much what you described in Collections.Specialized, namely NameValueCollection, but it's no good since it uses only strings.
A quick search turned up this, don't know if it will help you or not, but may be another take on the idea (you could easily modify it to accept generic keys, I think).
rags: just to be clear in this example a million (gasp objects) as say ints would take up a whopping 4 mb of memory! even once we add the overhead of the hash table's extra size (fill factor) we might be at 6? Oh Noez. Even with small objects say an address we would still be < 100 mb which again is nothing.
Not surprisingly the cost to make an out of process call to the database is about a million times larger than checking our value in memory.
there are certainly some archtitectural situations where you would want to keep the hash in memory.
oh well no response so here are the answers.
If your number of duplicates is low just make a Dictionary<whatever, List<whatevervalue>> almost as efficient as chainging and avoids the overhead of writing your own Dictionary (though writing one isn't that hard just meet the Idictionary<T,V> interface (I am sure we all wrote one at some point)..
If you expect a high number of duplicates then chaining or using the list<> as above would really suck instead use a Dictionary<T, Dictionary<T, V>> where the internal dictionary has a different hashing algorithm than the first one.
It really depends how many duplicates you expect to be seeing on average and whether or not the checking of the duplicates in a chain/list has a higher cost than running a hash function on your key.
the aurhor has never mentioned what kind of objects are being delt with - so if they are going to contain even a single stirng object the its a bad design to keep it in memory
as dictionary is also a key value pair i totally agree with your approach of keeping the key in the dictionayr and values in a generic list
"the aurhor has never mentioned what kind of objects are being delt with - so if they are going to contain even a single stirng object the its a bad design to keep it in memory"
Not at all.
You seem to be woefully ignorant of this thing called context. As I explained previously keeping them in memory allows for searches that are many order of magnitude faster than hitting an out of process call. If you need fast searching then its perfectly fine to be doing as a design decision.
Also keeping millions of objects in memory really is not that big of a deal. As an example I created some time ago a program that kept the entire state of the Toronto Stock Exchange in memory (orders and all). This was 3-5 million objects in memory at any given point (and they were not simple objects). The application ran in about a gig of memory but do to our other requirements (like needing to process 20k+ changes to the data per second) using something like a database would not have been a cost effective solution.
It is in fact quite common to put millions of objects in memory for performance reasons (with a horizontal partitioning plan) for when you run out of memory on a given machine. For reference see a "prevalence layer"
Be more careful in the future when you just call things "bad design" without having a context.
Cheers,
Greg
Thank you all for your replies!!!
I'm really encouraged by your estimates for size, and also the fact that you do agree that it's a reasonable approach to take.
Makes me feel like I'm not alone in my quest, and that, hopefully, I'll be able to solve it.
Unfortunately I don't know exactly how many duplicates I'm going to get, but I would guess that it would constitute aroud 5% of total.
It's important to say that the keys might duplicate, but not whole obejcts ( - a bit similar to postcode (key) vs person (value) scenaio), so I cannot throw any "duplicates" away.
gregoryyoung:
I have looked into doing a Dictionary> structure; in fact as I was looking to implement *multiple* keys,
so I attempeted to use List>> ..
I have attached the code I used for Dictionary approach... The thing was that it seemed quite slow. I suspect it might be because on adding each object I tried to translate the "object" to "keys". What do you think?
In the mean time I have tried to persue my original goal of custom hashtable - I have attached the file with my second attempt to solve it...
It seems reasonably quick to insert, but it seems a bit slow to lookup... Is it not worth persuing?
Cheers again for your answers - looking forward to hearing from you!
Regards,
Dmitri
I am not suprised wtf are you using reflection in it?
Why is this a responsibility of your MKD at all? Why is your MKD handling locking etc? This is the responsibility of other code.
Start simple ... profile the add/get of a "MKD" using a Dictionary<T, List<V>> I will even write it for you. (included). Filling it out should not take long.
Cheers,
Greg
Hey Greg,
Well, to be honest I am not hugely experienced on this front (and most certainly not comparing to some guys here).
But this is why I'm here - and experience is really what gives *you* the edge (and might explains some questions you had)..
I used reflection to map object fields to keys, which I agree is not really necessary.
I took the object locking idea from the article which did a similar job
( http://www.codeproject.com
Thank you for time, and I'll give it a go soon...
Cheers,
Dmitri
**** Brilliant ****
I have successfully tested the structure and it's quick for simple and even reasonably complex objects, both for storing, enumerating (hurray!).
Testing conditions: I have roughly 1 gb of free memory.
Using simple objects I managed to store up to 5.9 mil (with one key dymention).
Using fairly large objects I managed to store up to 1.5 mil (with one key dymention).
Using fairly large objects, with 10 key dymentions I managed to store up to 900 thousand.
Times it took:
Adding small obejects 1 dim, 1 mil - next to no time (less than 2 seconds)
Adding Large objects 1 dim, 1.5 mil - 17 seconds
Adding Large objects 2 dim, 1.5 mil - 28 seconds
Adding Large objects 10 dim - 1min 14 seconds
..However one nagging problem is removing items from the collection; the memory doesn't appear to clear when I:
1) Add items to the collection (- a good 1.5 million)
2) Remove items;
3) Add some more items.
Debugging through the steps I can see hash collection is empty. So why doesn't the memory get released?
I have attached the code I used to remove the elelments...
Thanks,
Dmitri
Business Accounts
Answer for Membership
by: ragi0017Posted on 2009-04-29 at 04:39:14ID: 24259231
you are talking about a million objects and in this scenario a database will be tbe best option because a million objects in dictionary/hashtable will consume a lot of memory and will decrease the performance