.NET List.Find Performance

Posted on 2013-12-31
Last Modified: 2014-01-04
I'm working on an ASP.NET C# application that will process most of the database reads and joins in memory using List.Find methods.

I wanted to see if normalizing the database would make a difference in performance in terms of how fast the List.Finds would operate.

So I created three types of class objects:
TypeBoth = ID, Type, Text  (Type = Type1 or Type2)
Type1 = ID, Text
Type2 = ID, Text

Then populated three Lists with randomly generated text fields
ListBoth  (alternating Type1 and Type2)   ( number of objects = 2X
ListType1       (number of objects = X)
ListType2       (number of objects = X)

Then I generated "index arrays" (because that's what the application will do) that contain randomly selected IDs from the Lists for Type1 and Type2, and randomly selected ID + Type pairs for TypeBoth.

For speed testing, I set up routines where the index arrays were used to Find the object in the List associated with the index array entry.  These routines have find statement like these:

For TypeBoth
wrkTypeBoth = wrkList.Find(i => (i.ID == wrkIndex) && (i.Type == wrkType));  // <== note, two conditions

For Type1 and Type 2
wrkType1 = wrkList.Find(i => i.ID == wrkIndex);  // <== note, one condition
wrkType2 = wrkList.Find(i => i.ID == wrkIndex);  // <== note, one condition

I then placed these routines in test loops.  The test loop for TypeBoth had twice as many objects in its List as for either Type1 or Type2, but the test loop for the Type1+Type2 ran 2 index read loops, i.e.:

    => find 200 indexes from 500 objects in ListBoth, two conditions per find

    => find 100 indexes from 250 objects in ListType1, one condition per find
    => find 100 indexes from 250 objects in ListType2, one condition per find

I counted all the Finds for both test loops to make sure I was doing the same number of Finds in either case.  I checked to make sure the Lists and Index arrays had the expected number of records before running the speed test. I used StopWatch to capture the elapsed times for both Test Loops.

Throughout a range of record counts, index sizes, and test cycles TestLoop12 always took approximately 1/2 the elapsed time of TestLoopBoth.

This tells me that normalizing the data so that the Finds can run with only one condition instead of two conditions creates a significant performance benefit.

My question is two part:
A)   does this seem to be in agreement with what other people know about the ASP.NET C# LIST.FIND method?
B)   are their any flaws in my test protocol or conclusions?

Any help with this would be appreciated.

Question by:codequest
  • 2
  • 2
LVL 75

Accepted Solution

käµfm³d   👽 earned 500 total points
ID: 39749314
As described by the documentation:

This method performs a linear search; therefore, this method is an O(n) operation, where n is Count.

Any time you do linear searching, your search time will suffer the further into the list the item occurs. You would see a tremendous benefit if you sorted the list first, and then used a different search technique. BinarySearch is already implemented by the Framework; other search techniques most likely require you to create your own implementation. If you are familiar with "big-O" notation, then a binary search will have O(log(n)) search time. In other words:

Items vs. Time
The blue line represents a linear search, and the green line represents a binary search. As you can see, as the number of items increases (the X axis), the time to search (the Y axis) grows slower with a binary search.

Author Comment

ID: 39750047
Thanks for the input.  I'll have to think about whether there is a basis for sorting.  The records are input by multiple people in no particular sequence, and gathered into hierarchy with other record types, so there's no obvious sorting approach.

While other types of search might also help, at this point I'm really just trying to confirm whether the additional normalization helps with the "Find" method.  That's going to significantly influence my DB design and the complexity of the code versus a less normalized approach.
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 500 total points
ID: 39750121
IMO, the reduction of one condition won't cause a significant decrease in search time. Think about the difference you observed, but imagine now that you had an infinite number of items. There is no significant overhead in the comparison of one field versus two fields. So with an infinite number of items you are basically taking the same amount of time to compare one field or two. The significance in search time comes from the type of search (linear) that you are performing.

For your testing, are you certain that both the single-condition list and the double-condition list were in the same exact order?

The data structure you use can also play a role in your search time. For example, if you were to use a Dictionary instead of a List, then you would get ridiculous speeds because a dictionary uses a hashing algorithm to determine where to place items in its internal storage. In terms of big-O notation, your searches would be O(1), or constant time. That means that no matter how many items you have in the Dictionary, searching takes the same amount of time. Sine you are contemplating having only a single condition, that property you referenced might be a good candidate for the key of the Dictionary. The only caveat is that this field would need to be distinct in its value across all items you place in the Dictionary.

Author Comment

ID: 39750259
Thanks for the input.  I reviewed the code, and tried some other variations.  It appears you are correct;  testing for one field vs two fields does not seem to make much difference.  

What makes a big difference is changing the size of the List of that is being searched.   If I make the two separate ListType1 and 2 twice as large as the one combined ListBoth, and run the X Finds on ListBoth and X/2 Finds on ListTypes1&2, the Finds on ListTypes1&2 now take twice as long, even with ListBoth using two comparison fields.

So, my takeaway is that the value of more normalization is that the separate Lists (for Type1 and Type2) will be smaller than a combined list containing both Types, and thus faster to search.

I also ran the test with Dictionary;  yes, it is brutally fast (30-100X faster).   If I fully normalize, then I'll have just one search field per table / collection (i.e. List or Dictionary), so Dictionary is possible from that standpoint.  I'll have to look at more use cases.  Also, I'll post another question about pros and cons of List vs Dictionary when used as in-memory proxies for tables (i.e. typed DataTables).

Interesting stuff. Thanks!

Featured Post

Secure Your Active Directory - April 20, 2017

Active Directory plays a critical role in your company’s IT infrastructure and keeping it secure in today’s hacker-infested world is a must.
Microsoft published 300+ pages of guidance, but who has the time, money, and resources to implement? Register now to find an easier way.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

IP addresses can be stored in a database in any of several ways.  These ways may vary based on the volume of the data.  I was dealing with quite a large amount of data for user authentication purpose, and needed a way to minimize the storage.   …
Wouldn’t it be nice if you could test whether an element is contained in an array by using a Contains method just like the one available on List objects? Wouldn’t it be good if you could write code like this? (CODE) In .NET 3.5, this is possible…
Attackers love to prey on accounts that have privileges. Reducing privileged accounts and protecting privileged accounts therefore is paramount. Users, groups, and service accounts need to be protected to help protect the entire Active Directory …

680 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question