Link to home
Start Free TrialLog in
Avatar of John
JohnFlag for United Kingdom of Great Britain and Northern Ireland

asked on

.net programming hash set, hashtable, dictionary, List, Which should I use? I amusing VB.net but C# is OK

I am writing an application where performance is quite important.

I need to store templates in memory.  A template is an array of integers.  

I was trying to use recordsets, but they seem pretty expensive to use and I've had great performance increases from using arrays so I decided to try that.  

Each Template has an ID which is an integer (0  -  65535)

I created a structure for a template and created an array whose data type is that structure.  

    
Public Structure udtTemplateRecord
        Dim TemplateType As UShort 
        Dim Count As UShort
        Dim Count2 As UShort
        Dim Type() As UShort
        Dim Length() As UShort
    End Structure

Dim Array(65535) as udtTemplateRecord

Open in new window


I was going to do this to look up a template (the lookup is where performance is important)

Dim Template as udtTemplateRecord
Template = Array(TemplateID)

Open in new window


While lookups should be pretty much instant as I am simply referencing an array by index, I realised that this will start to consume a lot of memory, especially when I start to have many of these structures in memory.  

Also it is inefficient as there will probably never be more than a 50-200 templates in memory at any one time (often as few as 3 or 4), but the array would be large (65535).  

Then I saw mention of a hashtable and I read this as only storing a template when I need it, rather than allocating space for every possible template ID but being optimised for lookups.  Some documentation pointed out that a dictionary object is newer/better.  Then I came across Hash Sets and lists.

I have never had to work on in-memory data like this before and usually performance of this part of a program is not important.  However it is this time.  

To be fair, I haven't a clue about pro's and cons of the above and don't want to spend a day or two reading about them all and trying to get code to work.  

I'd love a brief note on which structure (hash set, hashtable, dictionary, list etc) is best to use (with lookup performance being key).  I don't need to iterate through the data, sort it, filter it etc, simply look a template by it's ID (Integer) and return a structure .  

A bit of code to get me going would be super useful.  

Thanks in advance

John
ASKER CERTIFIED SOLUTION
Avatar of AndyAinscow
AndyAinscow
Flag of Switzerland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Dictionary would be the way to go as you are looking things up by the templateID.

You can have a dictionary of <int, udtTemplateRecord>.

eG:
Dim di As New Dictionary(Of Integer, udtTemplateRecord)()
Dim r As udtTemplateRecord = di(TemplateID)

Open in new window


While an array must reserve space up front the dictionary can grow dynamically as needed.
Understand, though, that whichever you use, (Hashtable or Dictionary), the Keys have to be unique.

-saige-
Avatar of John

ASKER

Two of you have said dictionary and one said hashtable OR dictionary

Is there a benefit of dictionary over hashtable?

I've been googling a lot and kinda decided to go for a hashtable instead of a dictionary after seeing this page, I don't have any compelling logic though.  

I also read this which indicated that dictionary and hashtable are close in terms of performance except when dealing with reference types.  I have no idea what that meant.  

I guess this page influenced me too because I don't know what reference types are and whether I'm using them, so using hashtables seemed simpler in this respect.  

Is there some way of knowing which is best suited to me?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of John

ASKER

Kyle, your comment about thread safety set me thinking.  There could be more than one thread adding/updating templates.  Would this affect it?
The only time you should use a hashtable IMO is if you're worried about thread safety.
To that I would say that you should use the collections that are defined in the Concurrent namespace. You get both type safety and thread safety using those collections.
Is the object (dictionary/hashtable) going to be updated by more than one thread at a time?  If so you need to lock it.  

Or if you're using at least 4.0 consider the ConncurentDictionary:
https://msdn.microsoft.com/en-us/library/dd287191(v=vs.110).aspx
Avatar of John

ASKER

Thankyou everyone for your help.  

I take on board the point that a dictionary would be better in most circumstances, but I am leaning towards using a HashTable in this case for these reasons:

  1. HashTable is threadsafe, so that's one less thing to worry about
  2. I am only saving one data type in the hashtable and so I don't think type safety is an issue.  
  3. A dictionary seems to be nicer and do more so it probably has more overheads.
  4. I only want to write/overwrite and read an item specified by it's ID.  iterating through, ordering and other nice features are wholly unnecessary

I'll leave this question open for a little while in case I've got it wrong or missed something - and while I go off and give it a try!

Thank you all once again
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I am only saving one data type in the hashtable and so I don't think type safety is an issue.
It's not now, but are you the one maintaining this thing...forever? Even if you are, will you remember that you only stored one kind of thing in this collection?

A dictionary seems to be nicer and do more so it probably has more overheads.
What do you think all that thread safe-code in HashTable is?   ;)
Avatar of John

ASKER

I appreciate your comment about maintaining the application.  

However in this case it has a very narrow use.  

I am writing a netflow collection agent.  
In netflow V9, the exporter (the device sending the netflow packets) sends two types of data.  

The data packets that are records of network traffic and templates which are information defining the data packets.  Templates need to be referenced every time a data record comes in.  I don't want to load them from recordset or disk or anything else slow, it needs to be in memory and access speed is crucial.  

Netflow V9 is not going to change any time soon and as such, there will be no need to modify the hashtables once they are working.  

The reason I am keen on performance is that I have seen many things such as working with recordsets noticably take time.  .Net code seems a lot slower than the old VB6 stuff I used to dabble in.  If I can pick a more primitive collection class, I am happy to trade away the niceties for speed.  

I could be getting many packets per second that need parsing and saving to somewhere.  I need this to be as responsive as possible.  My first attempt at parsing the data into recordsets was slow and osme packets were missed.  I changed it to byte arrays and performance went through the roof.  I realise that .net does lots of nice stuff and many of the things I'd normally use have lots of method and properties and the bloat slows them down, so I thought I'd take it back as far as possible - hence starting with arrays, however the performance was a trade-off against memory use.  I didn't want to use recordsets and things like recordset.select("TemplateID= &  TempID) as they are slow.  So I started looking around and found hash tables, has sets, lists and dictionaries.  This is how I got to here.

But to be brief, I don't think maintainability and type safety are important to this aspect of my project.
Roger that. It sounds like you've weighed the pros and cons of each, so you're in better shape than most. Good luck with the remainder of the project.
Just an aside.  As the performance is critical is there any reason you didn't want to write in eg. C++ this section of the program.  A compiled non .net C++ app can run about 10x as fast as a .net app.

Also
Public Structure udtTemplateRecord
        Dim TemplateType As UShort 
        Dim Count As UShort
        Dim Count2 As UShort
        Dim Type() As UShort
        Dim Length() As UShort
    End Structure

Dim Array(65535) as udtTemplateRecord

Open in new window


That doesn't look like much memory at all when one is talking of multi GB PC's these days.  What you have done with the array is memory inefficient but may well offer better performance than using a collection class.
Avatar of John

ASKER

Hi Andy, thanks for the input

The array was my first go at it.  It was like lightening to look up a template as there was no effort to find the record I wanted (  Array(TemplateID)   ) but I will be collecting from many sources with several collection domains for each source.  Each will need it's own collection of templates in memory, and this array structure wastes a ton of RAM for each collection of templates so RAM usage will balloon quite easily.  If it was a single collection of templates, I'd have stuck with it, but I need it to scale up.  

This is why I started looking at hashtables.  

I had though of modifying the array method to dim as many records as I have and when another one comes, re-dim it 1 bigger, copy the first part of the array (up to the template I am adding) add the new one and then copy the rest of the records to keep it in order.  

Then when trying to get a record from the array, do a binary search until I found the one I want (rather than iterating through record by record)

This seemed do-able, but then I thought that Microsoft will have already done this.  That is the thought process that led me to the HashTable.  

I would love to code it in C++, but I know some VB and a little C#.  I did a little C back in the early 90's but I haven't a clue where to get started with C++
All gave good reasoning for helping to select a collection type.