.net programming hash set, hashtable, dictionary, List, Which should I use? I amusing VB.net but C# is OK

I am writing an application where performance is quite important.

I need to store templates in memory.  A template is an array of integers.  

I was trying to use recordsets, but they seem pretty expensive to use and I've had great performance increases from using arrays so I decided to try that.  

Each Template has an ID which is an integer (0  -  65535)

I created a structure for a template and created an array whose data type is that structure.  

    
Public Structure udtTemplateRecord
        Dim TemplateType As UShort 
        Dim Count As UShort
        Dim Count2 As UShort
        Dim Type() As UShort
        Dim Length() As UShort
    End Structure

Dim Array(65535) as udtTemplateRecord

Open in new window


I was going to do this to look up a template (the lookup is where performance is important)

Dim Template as udtTemplateRecord
Template = Array(TemplateID)

Open in new window


While lookups should be pretty much instant as I am simply referencing an array by index, I realised that this will start to consume a lot of memory, especially when I start to have many of these structures in memory.  

Also it is inefficient as there will probably never be more than a 50-200 templates in memory at any one time (often as few as 3 or 4), but the array would be large (65535).  

Then I saw mention of a hashtable and I read this as only storing a template when I need it, rather than allocating space for every possible template ID but being optimised for lookups.  Some documentation pointed out that a dictionary object is newer/better.  Then I came across Hash Sets and lists.

I have never had to work on in-memory data like this before and usually performance of this part of a program is not important.  However it is this time.  

To be fair, I haven't a clue about pro's and cons of the above and don't want to spend a day or two reading about them all and trying to get code to work.  

I'd love a brief note on which structure (hash set, hashtable, dictionary, list etc) is best to use (with lookup performance being key).  I don't need to iterate through the data, sort it, filter it etc, simply look a template by it's ID (Integer) and return a structure .  

A bit of code to get me going would be super useful.  

Thanks in advance

John
LVL 5
JohnAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

AndyAinscowFreelance programmer / ConsultantCommented:
First C# or VB.Net makes no difference - they are identical.

hash, hashtable, dictionary are more or less the same sort of collection and very different from a list.  A dictionary is very good for random access as it stores a key and a value (which can be an object such as a structure) and the lookup on the key is optimised for random access.  If however one needs to iterate through all items sequentially then a list is much better.

In other words.  If you require random access then a dictionary, otherwise a list (or even sortedlist https://msdn.microsoft.com/en-us/library/system.collections.sortedlist(v=vs.110).aspx) is probably better.

ps.  Iterating a list of 200 items is going to be pretty rapid.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Kyle AbrahamsSenior .Net DeveloperCommented:
Dictionary would be the way to go as you are looking things up by the templateID.

You can have a dictionary of <int, udtTemplateRecord>.

eG:
Dim di As New Dictionary(Of Integer, udtTemplateRecord)()
Dim r As udtTemplateRecord = di(TemplateID)

Open in new window


While an array must reserve space up front the dictionary can grow dynamically as needed.
it_saigeDeveloperCommented:
Understand, though, that whichever you use, (Hashtable or Dictionary), the Keys have to be unique.

-saige-
Exploring SQL Server 2016: Fundamentals

Learn the fundamentals of Microsoft SQL Server, a relational database management system that stores and retrieves data when requested by other software applications.

JohnAuthor Commented:
Two of you have said dictionary and one said hashtable OR dictionary

Is there a benefit of dictionary over hashtable?

I've been googling a lot and kinda decided to go for a hashtable instead of a dictionary after seeing this page, I don't have any compelling logic though.  

I also read this which indicated that dictionary and hashtable are close in terms of performance except when dealing with reference types.  I have no idea what that meant.  

I guess this page influenced me too because I don't know what reference types are and whether I'm using them, so using hashtables seemed simpler in this respect.  

Is there some way of knowing which is best suited to me?
it_saigeDeveloperCommented:
In a nutshell, Reference Types are objects that contain a pointer to another memory location that holds the real data; whereas Value Types hold the data within its own memory allocation.  Generally Reference Types are defined by class instances and Value Types are defined by structure instances.

Quick Demo on Reference Types vs Value Types:
using System;

namespace EE_Q29065836
{
    class Program
    {
        static void Main(string[] args)
        {
            PersonStruct bob = new PersonStruct() { Name = "Bob", Birthdate = new DateTime(1972, 3, 5) };
            PersonClass peter = new PersonClass() { Name = "Peter", Birthdate = new DateTime(1974, 8, 20) };

            Console.WriteLine("Before age change");
            Console.WriteLine(bob.Age);
            Console.WriteLine(peter.Age);

            Console.WriteLine("Adding 5 years to Bob and Peter");
            ChangeBob(bob);
            ChangePeter(peter);

            Console.WriteLine("After age change");
            Console.WriteLine(bob.Age);
            Console.WriteLine(peter.Age);

            Console.ReadLine();
        }

        private static void ChangeBob(PersonStruct person)
        {
            person.Birthdate = person.Birthdate.AddYears(-5);
        }

        private static void ChangePeter(PersonClass person)
        {
            person.Birthdate = person.Birthdate.AddYears(-5);
        }
    }

    struct PersonStruct
    {
        public string Name { get; set; }
        public DateTime Birthdate { get; set; }
        public string Age { get { return string.Format("Structure {0} is {1} years old!!!", Name, (DateTime.Now.Year - Birthdate.Year)); } }
    }

    class PersonClass
    {
        public string Name { get; set; }
        public DateTime Birthdate { get; set; }
        public string Age { get { return string.Format("Class {0} is {1} years old!!!", Name, (DateTime.Now.Year - Birthdate.Year)); } }
    }
}

Open in new window

Produces the following results -Capture.PNG
-Richard
Kyle AbrahamsSenior .Net DeveloperCommented:
A good list of differences is listed here:
https://stackoverflow.com/questions/301371/why-is-dictionary-preferred-over-hashtable 

Understanding boxing and unboxing:
https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/types/boxing-and-unboxing

Performance Charts:
https://www.codeproject.com/Tips/657564/Performance-measurement-of-HashTable-vs-Dictionary

With final conclusions:
https://social.msdn.microsoft.com/Forums/vstudio/en-US/e442df1d-f4db-4176-b34c-1f460d155367/hash-table-vs-dictionary?forum=csharpgeneral

In general it's recommended to use Dictionary as it's type safe automatically and you avoid boxing / unboxing.  The only time you should use a hashtable IMO is if you're worried about thread safety.
käµfm³d 👽Commented:
The problem with HashTable is that you get no type safety. Everything goes in as type Object. That means you can easily add strings, integers, bananas, and even condominiums to your HashTable, and nary an error will appear. The issue is when you go to retrieve any of those objects from the HashTable. How do you know the type of what you're retrieving? You have to perform run-time type checking, which is considered expensive.

These kinds of problems were solved with the introduction of generics, which Dictionary is. Generics give you the flexibility of defining the behavior and structure of an algorithm or data structure, and you do not have to code a 2nd copy of that same behavior or structure when the type of thing that is in your collection changes. In other words, I can create a dictionary of bananas, and I get all the behavior of a dictionary--and I know that every object in that dictionary is a banana. But I can also create  a dictionary of condominiums, and I again get all the behavior of a dictionary as well as the safety of knowing that every object in the dictionary is a condominium.

The tl;dr; of it is:  Unless you can come up with a truly good reason to use a non-generic collection, then always use the generic version of the collection.
JohnAuthor Commented:
Kyle, your comment about thread safety set me thinking.  There could be more than one thread adding/updating templates.  Would this affect it?
käµfm³d 👽Commented:
The only time you should use a hashtable IMO is if you're worried about thread safety.
To that I would say that you should use the collections that are defined in the Concurrent namespace. You get both type safety and thread safety using those collections.
Kyle AbrahamsSenior .Net DeveloperCommented:
Is the object (dictionary/hashtable) going to be updated by more than one thread at a time?  If so you need to lock it.  

Or if you're using at least 4.0 consider the ConncurentDictionary:
https://msdn.microsoft.com/en-us/library/dd287191(v=vs.110).aspx
JohnAuthor Commented:
Thankyou everyone for your help.  

I take on board the point that a dictionary would be better in most circumstances, but I am leaning towards using a HashTable in this case for these reasons:

  1. HashTable is threadsafe, so that's one less thing to worry about
  2. I am only saving one data type in the hashtable and so I don't think type safety is an issue.  
  3. A dictionary seems to be nicer and do more so it probably has more overheads.
  4. I only want to write/overwrite and read an item specified by it's ID.  iterating through, ordering and other nice features are wholly unnecessary

I'll leave this question open for a little while in case I've got it wrong or missed something - and while I go off and give it a try!

Thank you all once again
Fernando SotoRetiredCommented:
käµfm³d 👽Commented:
I am only saving one data type in the hashtable and so I don't think type safety is an issue.
It's not now, but are you the one maintaining this thing...forever? Even if you are, will you remember that you only stored one kind of thing in this collection?

A dictionary seems to be nicer and do more so it probably has more overheads.
What do you think all that thread safe-code in HashTable is?   ;)
JohnAuthor Commented:
I appreciate your comment about maintaining the application.  

However in this case it has a very narrow use.  

I am writing a netflow collection agent.  
In netflow V9, the exporter (the device sending the netflow packets) sends two types of data.  

The data packets that are records of network traffic and templates which are information defining the data packets.  Templates need to be referenced every time a data record comes in.  I don't want to load them from recordset or disk or anything else slow, it needs to be in memory and access speed is crucial.  

Netflow V9 is not going to change any time soon and as such, there will be no need to modify the hashtables once they are working.  

The reason I am keen on performance is that I have seen many things such as working with recordsets noticably take time.  .Net code seems a lot slower than the old VB6 stuff I used to dabble in.  If I can pick a more primitive collection class, I am happy to trade away the niceties for speed.  

I could be getting many packets per second that need parsing and saving to somewhere.  I need this to be as responsive as possible.  My first attempt at parsing the data into recordsets was slow and osme packets were missed.  I changed it to byte arrays and performance went through the roof.  I realise that .net does lots of nice stuff and many of the things I'd normally use have lots of method and properties and the bloat slows them down, so I thought I'd take it back as far as possible - hence starting with arrays, however the performance was a trade-off against memory use.  I didn't want to use recordsets and things like recordset.select("TemplateID= &  TempID) as they are slow.  So I started looking around and found hash tables, has sets, lists and dictionaries.  This is how I got to here.

But to be brief, I don't think maintainability and type safety are important to this aspect of my project.
käµfm³d 👽Commented:
Roger that. It sounds like you've weighed the pros and cons of each, so you're in better shape than most. Good luck with the remainder of the project.
AndyAinscowFreelance programmer / ConsultantCommented:
Just an aside.  As the performance is critical is there any reason you didn't want to write in eg. C++ this section of the program.  A compiled non .net C++ app can run about 10x as fast as a .net app.

Also
Public Structure udtTemplateRecord
        Dim TemplateType As UShort 
        Dim Count As UShort
        Dim Count2 As UShort
        Dim Type() As UShort
        Dim Length() As UShort
    End Structure

Dim Array(65535) as udtTemplateRecord

Open in new window


That doesn't look like much memory at all when one is talking of multi GB PC's these days.  What you have done with the array is memory inefficient but may well offer better performance than using a collection class.
JohnAuthor Commented:
Hi Andy, thanks for the input

The array was my first go at it.  It was like lightening to look up a template as there was no effort to find the record I wanted (  Array(TemplateID)   ) but I will be collecting from many sources with several collection domains for each source.  Each will need it's own collection of templates in memory, and this array structure wastes a ton of RAM for each collection of templates so RAM usage will balloon quite easily.  If it was a single collection of templates, I'd have stuck with it, but I need it to scale up.  

This is why I started looking at hashtables.  

I had though of modifying the array method to dim as many records as I have and when another one comes, re-dim it 1 bigger, copy the first part of the array (up to the template I am adding) add the new one and then copy the rest of the records to keep it in order.  

Then when trying to get a record from the array, do a binary search until I found the one I want (rather than iterating through record by record)

This seemed do-able, but then I thought that Microsoft will have already done this.  That is the thought process that led me to the HashTable.  

I would love to code it in C++, but I know some VB and a little C#.  I did a little C back in the early 90's but I haven't a clue where to get started with C++
Fernando SotoRetiredCommented:
All gave good reasoning for helping to select a collection type.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
hashtable

From novice to tech pro — start learning today.