Multi threading processing

Hi Experts,
   I've recently come across the problem where after i generate a large list which contains tens of thousands records in the list, I want to insert the data into my database. What i did previously was using For each loop to process and insert the record into my database.
Normally for a small list it is fine that i can wait for 1 min or 2. but when comes to big lists I'll then have to wait much longer. I've done some testing on my machine, each record takes 3 secs to process on average.
so lets say i have 10000 records, that'll be nearly 8 hours which isn't reasonable as I might get even bigger lists.
   so i did some search then found multithreading might help? what about multi-core processing? does VB support that or can i gain control on that? or is there some other ways i could improve the speed of processing? any comments would be appreciated.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

You need to have multiple threads to take advantage of the multiple-cores. If you had a a few threads inserting data into your database it will increase the speed, but you'll run into an issue of synchronization. You can't have two threads inserting data at the same time (what if they try to insert into the same slot). So what you should do is google for some simple threading examples and synchronization steps. (Really you just want to create a block so only 1 thread can add at a time and then will unblock). If this is your first time using multiple threads I'd suggest playing around with creating a few threads that increment a shared integer and then implement synchronization until you get the hang of it. After that you will be able to create N number of insert threads easily and watch your performance increase :)
miketonnyAuthor Commented:
thanks, so in my case, shall i start with dividing this list into a few parts then creating thread accordingly? then insert them into database ?
Dont worry about dividing it. Lets say you have a list of items that need to be added to the database. You'll have 1 method that will take an item from your list and then add it to the database. What you'll need to do is add some locks (or whatever synchronization your language has).

So synchonization

list lock
get item that will insert into database
unlock the list
database lock
insert item
unlock database

Then you can create probably N*2 +1 threads for how many cores you have. You can read up more on how many threads you will want. Look for multi-thread examples to get an idea on what to do.
CompTIA Cloud+

The CompTIA Cloud+ Basic training course will teach you about cloud concepts and models, data storage, networking, and network infrastructure.

sorry i may not have clarified. Your 1 method will be the thread that you will use, make a general case with the synchronization. That way you can create N threads (which will basically be running N of those methods you created).
miketonnyAuthor Commented:
I'm using in visual studio 2008.
do you mind show me a simple example of locking a list and db?
in the new thread i'm going to create it will contain approx 3000 lines of codes which will call database connections at different points depending on the records in list, this is going to make locking dbs complicated is it? let me know if i'm wrong.
Read this here

It's not a lock that in specific to the data type. It's specific to the line of code. So if you have two threads going, they are running the same code sequentially. When one of them reaches the line of code that will call the lock method, it will look at the shared lock (usually just a int that will increment when unlocking and decrement when locking). If the value is 0, then that means that the thread will wait there until it becomes unlocked. This means that one thread will call the method, it will not be used. It will set the lock and then continue down the code, until it unlocks the lock.

Try first writing a simple program that has a thread that says the thread # and then hello. Run the two threads w/o locks. Then try and run it will locks. It will help clarify what you're doing.
miketonnyAuthor Commented:
thanks, that's helpful. one more question, if create 2 threads, executing the same function for both threads, say inside function i have codes like:
 Private Sub Load()
for each listitem as Item in ItemList
End Sub

in ButtonClick event i then wrote:

        Dim Load1 As New Thread(AddressOf Load)

        Dim Load2 As New Thread(AddressOf Load)

are each of the threads going to process the list as a whole or are they going to be smart enough to divide the list themselves then process ?
You should have a shared variable that will keep track of where they are in the list. Which if you only increment during the locked part then you should be okay.
miketonnyAuthor Commented:
i created a simple project to do some testings on it:
    Public Event Done()
    Public Sub Fill()
        Dim objLock As Object = New Object()
        SyncLock objLock
            Dim i As Int32
            Dim current As Int32

            For i = 0 To 100
                current *= 2
            RaiseEvent Done()
        End SyncLock
    End Sub

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        Dim thread0 = New Thread(AddressOf Me.Fill)

        Dim thread1 = New Thread(AddressOf Me.Fill)
    End Sub

Open in new window

but the synclock doesn't seem to working as it gives me this error:
"Cross-thread operation not valid: Control 'ListBox1' accessed from a thread other than the thread it was created on."
did i do something wrong in the code ?
I'm not sure why there is that sleep in the for-loop. I dont really know .net, but I think what you should be doing is thinking of each thread as it's own method (although they call the same method). So it looks like it doesn't like the two threads controlling the same data. I know that when implementing threads in c, you need to declare the variables in the global scope, outside of main so the two threads can change them.
miketonnyAuthor Commented:
um i figured i need to declare delegate to invoke the form's controls.
but speaking of my actual case. As i suspected, without splitting the list one record gets inserted 9 times. (i put 9 threads in my program). is there a way to cater that other than I splitting my list? and i dont really know after splitting how to assign a particular list to a thread.
Dim Load4 As New Thread(AddressOf Load)

addressof Load wont allow me to put parenthesis.
I can't help you with the syntax part, because I haven't programmed using .NET. But I can help you with the concept part of threads.

The way to think of threads, is that it's a function that you could call just once (ie. 1 thread) and it will perform the entire job just fine. Even with the synchronization. But using synchronization you make it so that each thread will do their own iteration of the thread (that's why you dont want to use a for each element loop). Use a while(!done) loop so they just keep looping until the list is done. If you need help with the theory I can help you, but I can't help you with the syntax and specifics of your program. :(
Before you go working too hard on the threading aspect, make sure that threading will actually solve the problem...

The application you described sounds as if inserting items into the database is the only thing it does... if that's true, then the effort on the client side shouldn't really be that big.  If you were doing calculations, then the client might be using some CPU power.  But if we are just talking raw inserts, then you need to check whether th bottleneck is the SQL server, the client CPU, or the network connection.

First thing to do is measure the CPU untilization on your application and see if you even come close to pegging the processor; if not, then look at your network utilization and the network latency between your client and the SQL server.

If you are pegging the network connection, then multiple threads may not help you at all.  In this case you need to look at getting a higher badnwidth link between the two points and / or looking at doing things like using more effecient SQL usage.  For example, I've seen applications process inserts like this where they are opening and closing the connection to SQL each time - that equals extra chatter back and forth to the server.  Also consider removing any extra formatting in your SQL statements - tab and return characters make for easy to read statements but are just wasted bytes when you send them to SQL.

If you aren't pegging the CPU or the network bandwidth but you have high latency to SQL, then you might consdier multi-threading.  The easier solution, if possible, is to move closer (in network topology) to your SQL server.  But if that can't be done, then having multiple conversations with the SQL server might be a benefit, but only if you know the SQL server isn't the bottleneck.

If you have access to the SQL server, check and see what it's CPU, network, and disk I/O look like when you have your inserts running.  If the SQL server is swampped and can't get the inserts done any faster, then having multiple threads won't gain you anything.

You should also check your SQL object.  If you are using a stored proc to insert, check it's query plan.  If you aren't, you should be - stored procs are faster than re-issuing an ad-hoc insert statement each time.  And if creating a stored proc isn't an option for you, then at least use a parameterized query statement so that SQL can effeciently re-use it's query plan.  Also check the indexes and FKs on the table that you are inserting to.  A table that is nothing but FKs to other tables means that SQL has to do a lot of checking of values; I'm not a fan of taking FKs off a table, but it might be something to play with if you determine your bottleneck is in SQL.  The other gottcha is indexes - especially clustered indexes.  If the physical order of the table has to be messed with for every insert, SQL could be spending a bunch of time it doesn't need to spend.  And for standard indexes, they have to be organized as well, so watch out for having too many indexes or indexes that are too wide.  It depends on your situation whether it is viable to do so or not, but in some cases it works well to remove the indexes before starting your inserts and then put them back in place afterward because sometimes it takes less time to get the data in and build the index than it does to try to get the data in with the index being maintained the whole time.

If you still decide to multi-thread this, be careful about having delegates from your threads calling back to your UI - you need to make sure that every place a thread interacts with the UI that it calls with Invoke() so that the UI thread can do the update to the UI.
miketonnyAuthor Commented:
Thx volox for the input, it's helpful. in my project i'm actually using a activeX control to do the insertion which another company provided us, so i cant really see the code inside the OCX. However, to be able to use the OCX to do the insert, i'll have to first convert my own data to meet the format/type requirement of the OCX, so each of the records takes some time to transform into the correct format, while there're 20 different format transforming functions that I have in the program.
What i want initially using the multithreading is to minimize the time that program uses to do the transformation rather than the insertion, as insertion part is out of my control. right now one transform could take up to 2sec of time, so if i have 5 threads working on different transformation at once, this will be much faster right? as all they queueing up now is the inserting part.

so under these circumstances, would you recommend me to use the multithreading?
network or sql server issue shouldn't be matter as once i compiled my program i'll run it from SQL server directly.

also, if possible, do you have a solution to the problem i encounted with inserting duplicating records with the multithreading?
Just because it runs on the SQL server doesn't mean that the disk I/O and inefficient table / index design doesn't have an impact.  But if you are using a third party OCX, then can I also assume that you don't have much control over the database design?

If the thing that you are passing rows off to is a thrid party OCX, then you have to be sure that it is capable of multithreading.  I have seen many a component that aren't designed to handle multithreaded input.  I won't bore you with the reasons why, but you should check with the vendor whether they support their component under such conditions, because if they don't all might seem well and then you could end up with some freaky strangeness once you go into production.  (Threading issues always manifest at the most inconvient times and usually in production and not in dev - usually because production has more processing power or has higher CPU utilization by other processes).  Your other option is to do your transformation multi-threaded and choke back down to one thread to pass off the data to the OCX.

So earlier in the thread when you were talking about accessing the database at multiple points, were you talking about doing that through the OCX or through standard ADO.NET?  Because frankly if your transformations require a bunch of data lookups through an OCX and the access to the OCX has to be synchronized, then you may not end up gaining much.

Your sample code above has a few issues...
1) as it sounds like you figured out, if you call to a UI element from a thread you have to Invoke() or BeginInvoke() in order to make the update to the UI happen within the UI thread  
2) The object you created to lock on 'objLock' is created within the method, that means that it is scoped within the run of the method and hence each thread creates their own objLock instance and then locks on their own instance.  For a lock object to work, it has to be in a memory space that is shared amongst the threads that are going to lock on it.  This means it will either need to be class-level instance - if you are locking within at class level (instance method) - or - will need to be static level if you need a lock that is global and shared across multiple instances of the class.  Understanding locking requires understanding the memory scoping and memory sharing concepts.
3) As you have it in your sample, if you simply move objLock to be static level, you will not gain yourself anything... what will happen is that the first thread to enter the method will process all the code within your lock and then when it is done the next thread will enter and process all the code.  So now, you will end up with each thread processing the same amount of work (duplicating one another), one thread at a time.

You will either need to split the list up before you lauch your threads so that they all have their own list to work through, or you will need to do as was suggested and have each thread check out it's work.  One approach might be to have one list that is the input and another that is the finsihed work.  You would do somthing like this...

Note that I've made a comment that you need to double check once you are inside your lock block.  The double check is to avoid a case where two threads do the check for the while loop and both think there is an item left and one thread gets the items and the other thread gets into the lock second and the collection is actually empty.  That's the kind of thinking you need to get accustomed to if you are going to write threadsafe code - if it isn't locked or somehow syncronized, then you can't depend on it to be the same from one moment to another.  Another excellent example is statements like 'var = var + 1' - those aren't threadsafe unless they are inside of a syncronized region - see Interlocked.Increment for a threadsafe alternative.


while (inputList.Count > 0)
lock (inputList)
  // Do a double check here
  workItem = inputList[0];

... process workItem

lock (outputList)

} // end loop

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
miketonnyAuthor Commented:
thx a lot for all the helps, that really helps me understanding the concepts.sry didn't get back to this earlier, i decided i'll do it without multithreading in my case, as their OCX not supporting. but i'm certainly gonna use it in my other programs.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Visual Basic.NET

From novice to tech pro — start learning today.