?
Solved

Dictionary capacity in VB.Net 2.0

Posted on 2006-05-02
15
Medium Priority
?
4,605 Views
Last Modified: 2012-08-14
Is there a known maximum capacity for Dictionaries?  We are adding entries to a dictionary, and we're finding that at 1 864 135 records, the dictionary consistantly drags in performance and appears to no longer add new records.  

The dictionary is appended each loop of a cycle, and we're trying to ensure we can handle in the tens of millions of unique keys in one dictionary.

Since we know in advance how big the dictionary will get, we've tried initializing the dictionary with the capacity we want, but it still reaches the exact same limit, suggesting that there is an internal maximum

Has anyone else encountered a maximum capacity to the dictionary class, and can you recommend a work-around?  
0
Comment
Question by:VSmirk
15 Comments
 
LVL 12

Expert Comment

by:AGBrown
ID: 16590253
The different dictionaries act in quite different ways. In .NET 1.1 classes based on the DictionaryBase use an internal hashtable. The Hashtable maintains an array of structs, and it will double the capacity each time it fills up. The internal count is an int and so its capacity will be limited to the size of an int (0 to 2,147,483,647). The ListDictionary maintains a chained set of DictionaryNode objects, and an internal count and a version number which are both ints - so there will be a natural limit because of the int.

In .NET 2.0, the generic dictionary is also apparently implemented as a hashtable, this would suggest similar limitations. However, your 1864135 is well below the limit of an int. Is it possible that you are seeing problems related to memory assignment? What is the size of the objects that you are storing in the dictionary?

Andy
0
 

Expert Comment

by:Juser7658
ID: 16590345
Just some guesses. Is there any chance you simply don't have enough memory? It is potentially possible that you would not be getting an error message simply because there is no room left for the error class. There is no room left for the next error, etc.

Perhaps .net has an object limit? That would have the same problems as a memory limit.

Also Rember that a dictionary is a hash table. Is there any chance that youve been bitten by the birthday paradox? I have no idea why this would not cause some sort of error.

Just trying to help.
0
 
LVL 12

Expert Comment

by:AGBrown
ID: 16590481
I was wrong about the hashtable doubling in size when capacity is reached, it actually increases to the next prime number that is larger than twice the current number:
"As elements are added to a Hashtable, the actual load factor of the Hashtable increases. When the actual load factor reaches the specified load factor, the number of buckets in the Hashtable is automatically increased to the smallest prime number that is larger than twice the current number of Hashtable buckets."

This could be the cause of your problems, I'll have a look at the primes it uses...
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 12

Expert Comment

by:AGBrown
ID: 16590501
It seems that the largest prime the hashtable uses is 0x6dda89, which if I get it right is 7,199,369
0
 
LVL 37

Expert Comment

by:gregoryyoung
ID: 16591045
"Is there a known maximum capacity for Dictionaries?  We are adding entries to a dictionary, and we're finding that at 1 864 135 records, the dictionary consistantly drags in performance and appears to no longer add new records.  "

It is most likely re-hashing the entire 1.7 million that you have in it while doing an expansion ... I would imagine this operation to take a little while! The resulting behavior would be it stopping for a while ... then continuing try using the overloaded constructor http://msdn2.microsoft.com/en-US/library/tk84bxf4.aspx with a large initial size (say 5,000,000) and see if you get the same behavior.

My guess is no.

Cheers,

Greg
0
 
LVL 12

Expert Comment

by:AGBrown
ID: 16591289
I am assuming you are using a .NET 2.0 generic dictionary? What is the line of code you use to initialise it (what are you setting the initial capacity to)? I think you are looking at some kind of memory effect (Juser may be right on an object limit but I would have no idea about that). Could you try testing this, if you haven't already, by initialising your dictionary with a size that is well over 1864135 (but under 7199369) and seeing what happens? Also, is this happening on more than one machine with different memory setups?

The reason for my memory comment is that I can't yet see how the problem might be related to the inner workings of the dictionary unless you have set your capacity as 1864135. There doesn't appear to be any reason for it to change the capacity at or around 1,864,135 if you didn't set an initial capacity. The two primes it uses that are close to that are 1,674,319 and 2,009,191. So unless you have created your dictionary using "new Dictionary<TKey, TValue>(1,864,135)" you might just be seeing the inner arrays resizing and copying (if you don't specify the initial capacity then it is set to 3).

Each time your dictionary fills up it has to go through the process of increasing the sizes of the internal arrays, copying the contents from the original arrays to the new ones. I can imagine this might take a while, especially once you have millions of objects.

Note that what looks worse is that once it tries to find the size of the prime number larger than 7,199,369, there are no primes listed in its static primes list and it will then "discover" a new prime number. To do this it does:
    Start
    {
        For every number, i, from the current number to 2147483647, where i > 2 * current capacity
        {
            Does it divide by 2? go to next i
            Else, iterate j from 3 to the square root of i
            {
                Does j divide into i? If so next i
            }
            If we haven't gone to the next i, this is prime.
        }
        If we didn't find a prime, return double the current capacity
    }
So if you ever have to increase your dictionary's capacity once it is larger than half of 7,199,369, you might have to wait a while while it finds you a new capacity!

Re: the birthday paradox. I think the only problem you would see there is that the original item with the same hash as the new item is replaced in the hashtable. The Dictionary computes the item hash, and if it's the same as an existing hash then it replaces the entry that has that hash. I might have misunderstood that one though.

I'm not sure that the Dictionary, or even the Hashtable is a good object for you to use for this problem given your "tens of millions of objects" comment. I suggest you have a look inside the Dictionary and the Hashtable using ildasm, or better still, Lutz Roeders reflector (http://www.aisto.com/roeder/DotNet/) to understand how it works. There are all sorts of potential problems with large numbers of objects stored in one of these classes, including the private version number that they maintain. You might want to consider writing a better object to help you solve your problem; what, in fact, is the problem that you are trying to solve?

A
0
 
LVL 12

Expert Comment

by:AGBrown
ID: 16591394
Hi Greg, wasn't trying to duplicate your post ... hadn't refreshed before posting. My main problem with the expansion idea is that it shouldn't be doing an expansion for another 140,000 odd objects (unless of course, vsmrik's initial capacity is 1864135). That at least is how I would interpret the Resize method of the Dictionary<TKey, TValue> given the prime list it uses. Could you see a reason that that doesn't hold?
0
 
LVL 13

Expert Comment

by:devsolns
ID: 16591719
What is the idea behind needing to store that much data in a dictionary anway?  Perhaps there is a solution to avoiding putting that much data into a volatile resource.  My guess is there is a much easier approach that will satisfy your requirments.  depending on the objects your adding your likely going to/already run into memory saturation issues and are swapping to death. a dictionary of a million or two can easily eat up a few hundreg megs.
0
 
LVL 12

Expert Comment

by:AGBrown
ID: 16591752
Well, i figured we'd covered enough theory ;), so I wrote a couple of test apps. It loops up to a limit and writes an int key and an int value, and doesn't show any slowdown up to about 20 million entries. The only "appreciable" wait was between 5961800 and 6160000, presumably when it had to iteratively find the next prime. The most likely conclusion would be that you would seem to be having memory issues of some sort. I think it might be time to break open the Performance monitor and watch your application's .NET memory counters as you build up your hashtable.

I also checked the capacities for a dictionary with no initial capacity specified, they are set in the order:
3
7
17
37
89
197
431
919
1,931
4,049
8,419
17,519
36,353
75,431
156,437
324,449
672,827
1,395,263
2,893,249
5,999,471
--finds the next prime iteratively
0
 
LVL 37

Expert Comment

by:gregoryyoung
ID: 16591788
really I am surprised it doesn't just use mersennes :-/
0
 
LVL 12

Expert Comment

by:AGBrown
ID: 16591829
Yup, my thoughts exactly, but who are we to reason why? Possibly it was thought that for most uses of the hashtable, dictionary etc, it was faster to store the primes. And possibly some exceptionally advanced algorithm was used to work out that most people wouldn't go above 5999471 without specifying a capacity :-) (though why, once that number is exceeded and a new one is found it isn't added to the prime number list, I'm not sure).

By the way, my line above that said "Else, iterate j from 3 to the square root of i" should read "Else, iterate j from 3 to the square root of i where j is odd only).
0
 
LVL 12

Accepted Solution

by:
AGBrown earned 1000 total points
ID: 16592241
Playing some more with the test application, and monitoring it using performance monitor: Storing 20 million entries in a Dictionary<int, string> (ints from 0 to 19,999,999) with the int as the key and the int.ToString() as the value, with no initial capacity specified, easily takes up 1.25 GB of memory. Any more objects than this, in fact, and I unsurprisingly get an out of memory exception during one of the capacity increases (from 23,997,907) when I am already at about 1.3 GB of memory for this program. Performance is fine in terms of building the dictionary up to this point so I would now say that your problem is most likely to be your memory allocation. Equally, I would say that you might be pushing your performance if this is the case and you may need to look at a different method of storing your data such as database persistence? It really depends on what you are trying to do.

If memory (no pun intended) serves correctly, the maximum amount of memory you can reference with a single program on 32-bit windows is 2 GB. I'm not sure how this differs between XP and server, and 32 and 64 bit.
0
 
LVL 37

Assisted Solution

by:gregoryyoung
gregoryyoung earned 1000 total points
ID: 16592733
It is not necesarily 2 gb, you can run the OS with a /3gb switch to get 3 gb.

Cheers,

Greg
0
 

Author Comment

by:VSmirk
ID: 16595826
Thanks so much for the excellent and detailed discussion!

As it happens, it turns out our problem was in the random number generator that was used to dope the keys in our dictionary.  For such a large range of serial numbers, with only the first random number generator seed, the numbers were ultimately seeding the same sub-set of values after a while.  

For me, this turns out to be a fascinating discussion on its own.  We ended up re-seeding when we had a certain count of duplicate keys generated, and counts went beyond the 1.8 million count to well into the 12 million count before we finally ended the program.

Thanks much for the education on dictionaries.  The more you know, the better your problem solving.  I am sure we will find this info useful, too.
0
 
LVL 12

Expert Comment

by:AGBrown
ID: 16600555
Its interesting that you were able to get to 12 million entries easily. If you don't mind me asking - what was the "size" of the object you were storing? I don't mean literally in bytes (unless you know that) but more the number and types of fields. It would be good to know for the future.

w.r.t the /3gb switch. Yup, I had heard of that one. I haven't seen or know of any performance testing on it, I had just "heard" it was unreliable so hadn't bothered looking into it ;)
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Najam
Having new technologies does not mean they will completely replace old components.  Recently I had to create WCF that will be called by VB6 component.  Here I will describe what steps one should follow while doing so, please feel free to post any qu…
We all know that functional code is the leg that any good program stands on when it comes right down to it, however, if your program lacks a good user interface your product may not have the appeal needed to keep your customers happy. This issue can…
Despite its rising prevalence in the business world, "the cloud" is still misunderstood. Some companies still believe common misconceptions about lack of security in cloud solutions and many misuses of cloud storage options still occur every day. …
Whether it be Exchange Server Crash Issues, Dirty Shutdown Errors or Failed to mount error, Stellar Phoenix Mailbox Exchange Recovery has always got your back. With the help of its easy to understand user interface and 3 simple steps recovery proced…
Suggested Courses

569 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question