Link to home
Start Free TrialLog in
Avatar of miketonny
miketonny

asked on

Multi threading processing

Hi Experts,
   I've recently come across the problem where after i generate a large list which contains tens of thousands records in the list, I want to insert the data into my database. What i did previously was using For each loop to process and insert the record into my database.
Normally for a small list it is fine that i can wait for 1 min or 2. but when comes to big lists I'll then have to wait much longer. I've done some testing on my machine, each record takes 3 secs to process on average.
so lets say i have 10000 records, that'll be nearly 8 hours which isn't reasonable as I might get even bigger lists.
   so i did some search then found multithreading might help? what about multi-core processing? does VB support that or can i gain control on that? or is there some other ways i could improve the speed of processing? any comments would be appreciated.
Avatar of sdern
sdern
Flag of United States of America image

You need to have multiple threads to take advantage of the multiple-cores. If you had a a few threads inserting data into your database it will increase the speed, but you'll run into an issue of synchronization. You can't have two threads inserting data at the same time (what if they try to insert into the same slot). So what you should do is google for some simple threading examples and synchronization steps. (Really you just want to create a block so only 1 thread can add at a time and then will unblock). If this is your first time using multiple threads I'd suggest playing around with creating a few threads that increment a shared integer and then implement synchronization until you get the hang of it. After that you will be able to create N number of insert threads easily and watch your performance increase :)
Avatar of miketonny
miketonny

ASKER

thanks, so in my case, shall i start with dividing this list into a few parts then creating thread accordingly? then insert them into database ?
Dont worry about dividing it. Lets say you have a list of items that need to be added to the database. You'll have 1 method that will take an item from your list and then add it to the database. What you'll need to do is add some locks (or whatever synchronization your language has).

So synchonization

list lock
get item that will insert into database
unlock the list
database lock
insert item
unlock database

Then you can create probably N*2 +1 threads for how many cores you have. You can read up more on how many threads you will want. Look for multi-thread examples to get an idea on what to do.
sorry i may not have clarified. Your 1 method will be the thread that you will use, make a general case with the synchronization. That way you can create N threads (which will basically be running N of those methods you created).
I'm using VB.net in visual studio 2008.
do you mind show me a simple example of locking a list and db?
in the new thread i'm going to create it will contain approx 3000 lines of codes which will call database connections at different points depending on the records in list, this is going to make locking dbs complicated is it? let me know if i'm wrong.
Read this here
http://www.developerfusion.com/article/5184/multithreading-in-vbnet/2/

It's not a lock that in specific to the data type. It's specific to the line of code. So if you have two threads going, they are running the same code sequentially. When one of them reaches the line of code that will call the lock method, it will look at the shared lock (usually just a int that will increment when unlocking and decrement when locking). If the value is 0, then that means that the thread will wait there until it becomes unlocked. This means that one thread will call the method, it will not be used. It will set the lock and then continue down the code, until it unlocks the lock.

Try first writing a simple program that has a thread that says the thread # and then hello. Run the two threads w/o locks. Then try and run it will locks. It will help clarify what you're doing.
thanks, that's helpful. one more question, if create 2 threads, executing the same function for both threads, say inside function i have codes like:
 Private Sub Load()
for each listitem as Item in ItemList
         Codes
next
End Sub

in ButtonClick event i then wrote:

        Dim Load1 As New Thread(AddressOf Load)
        autoLoad1.Start()

        Dim Load2 As New Thread(AddressOf Load)
        autoLoad2.Start()

are each of the threads going to process the list as a whole or are they going to be smart enough to divide the list themselves then process ?
You should have a shared variable that will keep track of where they are in the list. Which if you only increment during the locked part then you should be okay.
i created a simple project to do some testings on it:
    Public Event Done()
    Public Sub Fill()
        Dim objLock As Object = New Object()
        SyncLock objLock
            Dim i As Int32
            Dim current As Int32

            For i = 0 To 100
                ListBox1.Items.Add(current)
                current *= 2
                Thread.Sleep(3000)
            Next
            RaiseEvent Done()
        End SyncLock
    End Sub

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim thread0 = New Thread(AddressOf Me.Fill)
        thread0.Start()

        Dim thread1 = New Thread(AddressOf Me.Fill)
        thread1.Start()
    End Sub

Open in new window


but the synclock doesn't seem to working as it gives me this error:
"Cross-thread operation not valid: Control 'ListBox1' accessed from a thread other than the thread it was created on."
did i do something wrong in the code ?
I'm not sure why there is that sleep in the for-loop. I dont really know .net, but I think what you should be doing is thinking of each thread as it's own method (although they call the same method). So it looks like it doesn't like the two threads controlling the same data. I know that when implementing threads in c, you need to declare the variables in the global scope, outside of main so the two threads can change them.
um i figured i need to declare delegate to invoke the form's controls.
but speaking of my actual case. As i suspected, without splitting the list one record gets inserted 9 times. (i put 9 threads in my program). is there a way to cater that other than I splitting my list? and i dont really know after splitting how to assign a particular list to a thread.
Dim Load4 As New Thread(AddressOf Load)

addressof Load wont allow me to put parenthesis.
I can't help you with the syntax part, because I haven't programmed using .NET. But I can help you with the concept part of threads.

The way to think of threads, is that it's a function that you could call just once (ie. 1 thread) and it will perform the entire job just fine. Even with the synchronization. But using synchronization you make it so that each thread will do their own iteration of the thread (that's why you dont want to use a for each element loop). Use a while(!done) loop so they just keep looping until the list is done. If you need help with the theory I can help you, but I can't help you with the syntax and specifics of your program. :(
Before you go working too hard on the threading aspect, make sure that threading will actually solve the problem...

The application you described sounds as if inserting items into the database is the only thing it does... if that's true, then the effort on the client side shouldn't really be that big.  If you were doing calculations, then the client might be using some CPU power.  But if we are just talking raw inserts, then you need to check whether th bottleneck is the SQL server, the client CPU, or the network connection.

First thing to do is measure the CPU untilization on your application and see if you even come close to pegging the processor; if not, then look at your network utilization and the network latency between your client and the SQL server.

If you are pegging the network connection, then multiple threads may not help you at all.  In this case you need to look at getting a higher badnwidth link between the two points and / or looking at doing things like using more effecient SQL usage.  For example, I've seen applications process inserts like this where they are opening and closing the connection to SQL each time - that equals extra chatter back and forth to the server.  Also consider removing any extra formatting in your SQL statements - tab and return characters make for easy to read statements but are just wasted bytes when you send them to SQL.

If you aren't pegging the CPU or the network bandwidth but you have high latency to SQL, then you might consdier multi-threading.  The easier solution, if possible, is to move closer (in network topology) to your SQL server.  But if that can't be done, then having multiple conversations with the SQL server might be a benefit, but only if you know the SQL server isn't the bottleneck.

If you have access to the SQL server, check and see what it's CPU, network, and disk I/O look like when you have your inserts running.  If the SQL server is swampped and can't get the inserts done any faster, then having multiple threads won't gain you anything.

You should also check your SQL object.  If you are using a stored proc to insert, check it's query plan.  If you aren't, you should be - stored procs are faster than re-issuing an ad-hoc insert statement each time.  And if creating a stored proc isn't an option for you, then at least use a parameterized query statement so that SQL can effeciently re-use it's query plan.  Also check the indexes and FKs on the table that you are inserting to.  A table that is nothing but FKs to other tables means that SQL has to do a lot of checking of values; I'm not a fan of taking FKs off a table, but it might be something to play with if you determine your bottleneck is in SQL.  The other gottcha is indexes - especially clustered indexes.  If the physical order of the table has to be messed with for every insert, SQL could be spending a bunch of time it doesn't need to spend.  And for standard indexes, they have to be organized as well, so watch out for having too many indexes or indexes that are too wide.  It depends on your situation whether it is viable to do so or not, but in some cases it works well to remove the indexes before starting your inserts and then put them back in place afterward because sometimes it takes less time to get the data in and build the index than it does to try to get the data in with the index being maintained the whole time.

If you still decide to multi-thread this, be careful about having delegates from your threads calling back to your UI - you need to make sure that every place a thread interacts with the UI that it calls with Invoke() so that the UI thread can do the update to the UI.
Thx volox for the input, it's helpful. in my project i'm actually using a activeX control to do the insertion which another company provided us, so i cant really see the code inside the OCX. However, to be able to use the OCX to do the insert, i'll have to first convert my own data to meet the format/type requirement of the OCX, so each of the records takes some time to transform into the correct format, while there're 20 different format transforming functions that I have in the program.
What i want initially using the multithreading is to minimize the time that program uses to do the transformation rather than the insertion, as insertion part is out of my control. right now one transform could take up to 2sec of time, so if i have 5 threads working on different transformation at once, this will be much faster right? as all they queueing up now is the inserting part.

so under these circumstances, would you recommend me to use the multithreading?
network or sql server issue shouldn't be matter as once i compiled my program i'll run it from SQL server directly.

also, if possible, do you have a solution to the problem i encounted with inserting duplicating records with the multithreading?
ASKER CERTIFIED SOLUTION
Avatar of Volox
Volox
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
thx a lot for all the helps, that really helps me understanding the concepts.sry didn't get back to this earlier, i decided i'll do it without multithreading in my case, as their OCX not supporting. but i'm certainly gonna use it in my other programs.