Solved

LINQ List searching performance

Posted on 2011-09-11
20
251 Views
Last Modified: 2012-05-12
Hi Experts,
    I recently wrote a linq query to do a search in a list of items. Each item in my list contains 50 fields, now I have two lists, their set ups and fields are all the same. List A contains 100 records, List B contains 10000 records.  What i need to do is using items in list a to search within List B to find its match.
   That's simple enough and I manage to do that. however, If I increase my List B size to 100000 records, and using the same List A, the speed it does the job will be slower. Is there an efficient way of doing this other than what i did?

My linq query:
       For Each pmTransaction In ListA
                Dim tran As New Trans
                tran = pmTransaction
                Dim tranExist = (From tr As Trans In ListB Where _
                                 tr.internref = tran.internref And tr.ledgerno = tran.ledgerno And _
                                 tr.externref = tran.externref And _
                                 tr.descriptn = tran.descriptn And _
                                 tr.trantype = tran.trantype And tr.accountid = tran.accountid).FirstOrDefault

Open in new window

0
Comment
Question by:miketonny
  • 9
  • 8
  • 3
20 Comments
 
LVL 17

Expert Comment

by:nepaluz
ID: 36520342
Your code looks incomplete, however, a couple of questions.
1. When you say the speed it does the job will be slower, do you mean that the speed you have noted IS sower or MAY be slower?
2. What is / how do you define Trans in your code?
3. Have you tried a database (I say this because bydesign, databases are meant to hold huge records are are optimised for speed)
0
 
LVL 2

Author Comment

by:miketonny
ID: 36520348
hi nepaluz,
   It is slower, I ran 1 week's records took me 5mins, but when i try 1 month's records it then took me nearlly 2 hours, the records are spreading evenly for every week. I'm not sure why is it taking longer than 20mins when i'm trying to run 1month's records.
  "trans" is a class for transactions, contains 50 fields which i read the data from database and store them into these two lists. if that answers your question?
0
 
LVL 2

Author Comment

by:miketonny
ID: 36520369
to add some information, When i put the program on server and test, it consumes 100% of 1 CPU core(on a 4core machine) when it's doing the searching. so i guess if i increase the CPU speed it'll be faster? How about multicore?
0
 
LVL 17

Expert Comment

by:nepaluz
ID: 36520372
I hink it would be better to run your queries on the database rather than into lists. However if you chose to persue this avenue, is that the complete loop (i.e should I assume the next line is Next?) Also, what version of .NET are you running this on?
0
 
LVL 17

Expert Comment

by:nepaluz
ID: 36520376
You can utilise a Parallel. ForEach to run this loop and improve both performance and CPU usage.
0
 
LVL 17

Expert Comment

by:nepaluz
ID: 36520391
Something like this may improve your performance (and use more of your cores)
Threading.Tasks.Parallel.ForEach(_otherCurrency, Sub(pmTransaction)
                                                     Dim tran As New Trans
                                                     tran = pmTransaction
                                                     Dim tranExist = (From tr As Trans In ListB Where _
                                                                      tr.internref = tran.internref And tr.ledgerno = tran.ledgerno And _
                                                                      tr.externref = tran.externref And _
                                                                      tr.descriptn = tran.descriptn And _
                                                                      tr.trantype = tran.trantype And tr.accountid = tran.accountid).FirstOrDefault
                                                  End Sub)

Open in new window

0
 
LVL 17

Expert Comment

by:nepaluz
ID: 36520401
Also, why do you not define ListA as a ist of Trans, e.g
Dim ListA As New List(Of Trans)

Open in new window

Also, I erred on the code above, shoud be:
Threading.Tasks.Parallel.ForEach(ListA, Sub(pmTransaction)
                                                     Dim tran As New Trans
                                                     tran = pmTransaction
                                                     Dim tranExist = (From tr As Trans In ListB Where _
                                                                      tr.internref = tran.internref And tr.ledgerno = tran.ledgerno And _
                                                                      tr.externref = tran.externref And _
                                                                      tr.descriptn = tran.descriptn And _
                                                                      tr.trantype = tran.trantype And tr.accountid = tran.accountid).FirstOrDefault
                                                  End Sub)

Open in new window

0
 
LVL 2

Author Comment

by:miketonny
ID: 36520464
Sry was my mistake, it's a complete for loop, i forgot to paste the next on it.
I did declare my List A as list of trans

I'm using VS 2008 which has .Net 3.5.
I was actually reading the PLINQ on the internet, but a lot of sources said it's only for .NET 4.0, is that right?
0
 
LVL 2

Author Comment

by:miketonny
ID: 36520468
I asked my colleague how he would handle that, he said using a sorted list could be faster than LINQ (he doesn't do LINQ) as sorted list is using binary search, could that be a way of dealing with this?
0
 
LVL 17

Expert Comment

by:nepaluz
ID: 36522200
Not sure what you are trying to achieve now. Sorted list to accomplish a search? I must have missed something  .....
Anyhow, to continue with my suggestion, if you have ListA declared as a List(Of Trans), then you can improve on memory by just doing:
Threading.Tasks.Parallel.ForEach(ListA, Sub(tran)
                                            Dim tranExist = (From tr As Trans In ListB Where _
                                                             tr.internref = tran.internref And tr.ledgerno = tran.ledgerno And _
                                                             tr.externref = tran.externref And _
                                                             tr.descriptn = tran.descriptn And _
                                                             tr.trantype = tran.trantype And tr.accountid = tran.accountid).FirstOrDefault
                                        End Sub)
GC.Collect()

Open in new window

Since you are dealing with hundreds of thousands of lines, re-declaring Dim tran As New Trans inside the loop will result in (a small) but additional usage of memory for each declaration, and withyour hundreds of thousands of lines, it does add up!
I have also added a GC.Collect() at the end of the routine.
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 
LVL 2

Author Comment

by:miketonny
ID: 36525835
umm, under threading I couldn't find tasks.  does that mean i don't have that in my .NET 3.5?
but ya thanks for pointing out that its better to declare that variable outside the loop, i didn't realize that.
0
 
LVL 17

Expert Comment

by:nepaluz
ID: 36526010
You are right about 3.5, but did you know that youcan actually use .NET 4.0 in VS 2008? Just install the .NET 4.0 SDK (thats if you are not on a work machine!)
Best of luck with the rest then .......
0
 
LVL 2

Author Comment

by:miketonny
ID: 36526301
if in that case, all the servers that's gonna run the program will need .NET 4.0 then?
I'll dig a little into this, but PLINQ does seem to be a good way
0
 
LVL 83

Expert Comment

by:CodeCruiser
ID: 36526320
You did not answer the question (or I did not find it) "why lists?"!

The obvious improvement that can be made (but may not make a huge difference) is



For Each tran As Trans In ListA
                Dim tranExist = (From tr As Trans In ListB Where _
                                 tr.internref = tran.internref And tr.ledgerno = tran.ledgerno And _
                                 tr.externref = tran.externref And _
                                 tr.descriptn = tran.descriptn And _
                                 tr.trantype = tran.trantype And tr.accountid = tran.accountid).FirstOrDefault

Open in new window

0
 
LVL 17

Accepted Solution

by:
nepaluz earned 300 total points
ID: 36526702
Actually, I think we were fixated on LINQ (and somehow it clouded our thinking here!). Try:
Dim tranExist = ListA.Intersect(ListB)

Open in new window

That should give you alist of all common occurancesin both lists. As the meerkat says, SIMPLESSSS!
0
 
LVL 2

Author Comment

by:miketonny
ID: 36526922
@ CodeCruiser, I'm used to use lists to do these kind of things, is there a more efficient way of doing such? something like query through database to do the same?

@nepaluz, that does look simple enough! I'll test a little to see how it goes.
0
 
LVL 83

Expert Comment

by:CodeCruiser
ID: 36532018
Yes this can be done on the DB. Are the lists being populated from DB?
0
 
LVL 2

Author Comment

by:miketonny
ID: 36532324
Yes they're all from the same table in foxpro database.
so in VB shall i just write a long query to do the same thing when i bring in these?
would this hold up the database for too long?
0
 
LVL 83

Assisted Solution

by:CodeCruiser
CodeCruiser earned 200 total points
ID: 36532345
Oh Foxpro. If foxpro supports the TSQL as any other DB then this should be straight forward. Otherwise, you can fill a DataTable and use the RowFilter to do this.
0
 
LVL 2

Author Comment

by:miketonny
ID: 36719786
thank you both for the help on this problem, I learnt something new on this :)
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Many of us here at EE write code. Many of us write exceptional code; just as many of us write exception-prone code. As we all should know, exceptions are a mechanism for handling errors which are typically out of our control. From database errors, t…
The ECB site provides FX rates for major currencies since its inception in 1999 in the form of an XML feed. The files have the following format (reducted for brevity) (CODE) There are three files available HERE (http://www.ecb.europa.eu/stats/exch…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

26 Experts available now in Live!

Get 1:1 Help Now