?
Solved

Insert command inside transaction takes too long

Posted on 2006-05-26
16
Medium Priority
?
429 Views
Last Modified: 2008-01-09
Background:

I am developing a utility in C# .NET 2.0 that performs a large number of inserts into a sql server 2000 database. This utility started out using BulkCopy and has since been switched to use SQLXMLBulkLoad to keep RAM usage down. Anyhow, this utility moves a LARGE amount of data to the specified SQL server in a very quick way. It can load small amounts of data from the xml file in seconds and even large data sets in mere minutes. The change to SQLXMLBulkLoad has required me to load the data into temporary tables and then perform inserts and updates to move this data where I need it, which works fine. Recently, I just switched it to use the SqlTransaction object in the .NET 2.0 Framework. At first, I did not noticed any problems with this change.

Problem:

After testing some small and medium sized datasets, I noticed the update and insert commands were a little slower than without the transaction. I moved the Bulkloads outside of the transaction (that process is handled before the transaction takes place and the temp tables are deleted after the transaction is committed) and that provided a good enough speed boost to make the transfer fairly fast again. Recently, I tried loading a data set that contained 200,000 rows whose fields would be inserted into 2 tables.  To be more specific, 2 of the fields in the 200,000 rows go into one table (Table1) and 40 of the fields of the 200,000 rows go into another table (Table2). I bulkload the data into the temporary table consisting of all the fields and the entire 200,000 rows. I then start a transaction, turn off all triggers on the tables, and then perform the insert commands. The insert of the 2 fields into Table 1 is very fast. Then I perform the insert into Table 2 (using a field from Table 1 through a join also). This insert never ends. I've let it sit there for over 8 hours and it does nothing. I have run a few benchmarks and realized that doing this with just a few thousand rows requires a large amount of time to complete. If I do all of this without a transaction, it runs fine and takes less than 5 minutes. If I do this all in the query analyzer, it runs just as fast. Also, I am about 98% sure this is not an issue with deadlocking since it works on smaller data sets (less than 200 rows) and even on medium-sized data sets (a couple thousand rows), it eventually finishes after a while.

The option of not using a transaction is out of the question as I am doing large inserts of data and if any error occurs, I need all the data to not be committed. I've even gone through and added "WITH (NOLOCK)" and "SET NOCOUNT ON" to all my select statements just to make sure it is as optimized on the SQL server as possible. One more bit of information is that I am doing all the insert commands with the SqlCommand object and the insert commands are using "INSERT INTO" with a "SELECT" statement from the temporary table. Does anyone have any recommendations on how I should go about fixing this?

For an added bonus, after this process finishes, it will then perform an insert of 500,000 rows for another table... I haven't even gotten there yet!

Thanks in advance!
0
Comment
Question by:RyGiL
  • 6
  • 3
  • 2
  • +3
14 Comments
 
LVL 5

Assisted Solution

by:rmacfadyen
rmacfadyen earned 100 total points
ID: 16774369
This sure sounds like a dead locking issue. Have you used Enterprise Manager to look at what items are locked and what items are being waited on?

You might also want to consider batching the inserts. Doing one transaction that contains a large amount of data will slow down considerable as you do more and more data. Plus the roll back (if necesary) could take a lot of time too. Not to mention the growth of the LDF file.

You'll need to keep track somewhere what "batch" you're on and where each batch begins and ends and whether or not the batch was processed or not. If it fails you should then be able to just start with all unprocessed batches.

You can expirement with batch sizes... I'd probably start with 1000 (maybe 500).

Regards,

Rob
0
 
LVL 11

Assisted Solution

by:anyoneis
anyoneis earned 100 total points
ID: 16774392
Are you managing the connection or letting the TableAdapter manage it? Hopefully the former!

David
0
 
LVL 28

Assisted Solution

by:imran_fast
imran_fast earned 100 total points
ID: 16774942
Increase the default mdf and ldf size for tempdb database and your destination database and  change the recovery model to simple.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:RyGiL
ID: 16775505
rmacfadyen:
I agree that it sounds like a deadlocking issue but I have tried a few things that I believe prove it is not. The insert command is in this format: "INSERT [TABLE1] INTO (field1,field2,field3,...) SELECT (field1,field2,field3,...) FROM TempTable". I have changed the select statement to benchmark/test a few things by setting it to "SELECT TOP 100" and "SELECT TOP 1000". So this tells me that there is no deadlocking since smaller sets of the data can be inserted.

As far as batch inserts go, I would like to handle it all in one big transaction for error catching purposes. I use the worst case scenario to explain why I want to do this. The worst case scenario is that the connection is broken from the SQL server for some reason and cannot be re-established. With batch inserts, this means that only a portion of the data was inserted (and committed) so the users would be stuck with erroneous data. If they ran the import again, it would result in all that data being duplicated. If I do it in one transaction and the connection breaks, nothing is written because nothing is committed.

anyoneis:
I am simply opening the connection (SqlConnection), beginning a transaction (SqlTransaction), and issuing insert commands (SqlCommand).

imran_fast:
I tried that at about midnight last night and am still having the same problem, I set the mdf to 2 gigs and the ldf to 10 gigs. I also set the recovery model to simple. It performed just like before.


----------------------------------
Am I asking for too much for a transaction to handle? It baffles my mind that it works perfectly without a transaction or even when using a transaction, the longest amount of time is the actual commands and not the commit itself. As far as batch inserts go as rmacfadyen mentioned, would it be possible to do each insert one row at a time but keep it all in the transaction? I would imagine this would take even longer.



0
 

Author Comment

by:RyGiL
ID: 16775715
Here's an update on some more tests I have run. Apparently, the bigger problem here is fact that I am inserting 40 columns in all 200,000 rows. I reduced the column to just 17 and it ran fine. I even tried my data set with 500,000 rows and only 5 columns and it ran fast as well. Would it be faster to do a small insert of just a few columns and then turn around and update those rows with the rest of the columns? Why?
0
 
LVL 75

Accepted Solution

by:
Anthony Perkins earned 100 total points
ID: 16775819
Using Transactions with large data sets is a problem and it is a problem that gets exponentially worse as the number of rows increase.  In other words: If you are working with say 10K rows it may take 10 seconds to run, but if you double the number of rows it takes over a minute.

I recently came across your same problem with a data set of over 350K and I have not found a good solution, yet.  Incidentally, in our case, there are not many columns but one of them may be quite large (800 Kb) What we will probably end up doing is creating a home-grown "transaction", either by:
1. Deleting all the added rows from all the involved tables if there is an error or by
2. (The brute force method) Doing a backup prior to starting and restoring if it fails.

Sorry to be the harbinger of bad news.
0
 

Author Comment

by:RyGiL
ID: 16787230
Welp, after extensive testing, I concluded that it was the amount of columns being inserted that causes the slowdown, NOT the amount of rows. Certainly, the amount of rows does factor in but not nearly as much as the amount of columns being inserted.  

To get around this, I basically do an insert with the first column and a "temporary" column I created to keep track of which rows to use in the updates (mentioned in next sentence). I then do a loop (C#) where I update 5 columns or less each (using the "temporary" column as a reference point in the where clause) until all 40 columns are updated. This handles it when the data set has 4 columns or 400 columns. It only takes about 20-30 seconds per set of 5 columns for the entire update for the SQL servers on my local machine. I even tried the same thing over the internet (worse case as this will usually only be used on an external SQL server on a LAN) and it only took about 5 minutes for all 40 columns.
0
 
LVL 5

Expert Comment

by:rmacfadyen
ID: 16787257
How odd.

It might worth doing some system monitoring to figure out where SQL was spending its time. I'm wondering if this is an artifact of your test environment... something to do with the actual mdf/ldf files (eg. autogrow). If your production environment is considerably different (multple MDF's, different/better disk subsystem, more cache, etc etc etc) trying to determine why you're getting the results you're getting (with the original method). PerfMon should help pin this down... and perhaps a call to MS tech support.

Rob
0
 
LVL 5

Expert Comment

by:rmacfadyen
ID: 16792213
Fine with me.

Rob
0
 
LVL 11

Expert Comment

by:anyoneis
ID: 16794615
Fine with me.

David
0
 

Author Comment

by:RyGiL
ID: 16794857
One more thing to note for my solution... I finally found a distinct reason why it works on some machines and not others. I got to work today and the process did not work on our SQL server. I tested it on about 3 other machines and only one worked. After hours of constant tweaking to the insert and update commands and still getting the same result, I took a different approach in finding the differences in the machines. Come to find out, it works on my local machine and another machine on the network that both have 2GB of RAM. It also worked on a machine that had 6 GB RAM. The systems it did not work had 1GB of RAM or less. The odd thing is, the task manager's mem usage for the sqlserver.exe never goes above a hundred megs of ram. After playing with all the options on the SQL server (MDF size, LDF size) and not getting anywhere, I have decided to split the mass insert/update sql command into reading a single row at a time and inserting it.

Yes this process is painfully slow and quite sloppy BUT it works. This new method doesn't care what kind of RAM is on the machine the SQL server is hosted on.

0
 
LVL 5

Expert Comment

by:Netminder
ID: 16840084
Closed, 500 points refunded.
Netminder
Site Admin
0
 

Author Comment

by:RyGiL
ID: 16841528
I didn't want the points refunded, I wanted them reduced and split among everyone.
0
 

Author Comment

by:RyGiL
ID: 16844370
Thanks!
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently we ran in to an issue while running some SQL jobs where we were trying to process the cubes.  We got an error saying failure stating 'NT SERVICE\SQLSERVERAGENT does not have access to Analysis Services. So this is a way to automate that wit…
When trying to connect from SSMS v17.x to a SQL Server Integration Services 2016 instance or previous version, you get the error “Connecting to the Integration Services service on the computer failed with the following error: 'The specified service …
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.
Suggested Courses

862 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question