Link to home
Start Free TrialLog in
Avatar of RyGiL
RyGiL

asked on

Insert command inside transaction takes too long

Background:

I am developing a utility in C# .NET 2.0 that performs a large number of inserts into a sql server 2000 database. This utility started out using BulkCopy and has since been switched to use SQLXMLBulkLoad to keep RAM usage down. Anyhow, this utility moves a LARGE amount of data to the specified SQL server in a very quick way. It can load small amounts of data from the xml file in seconds and even large data sets in mere minutes. The change to SQLXMLBulkLoad has required me to load the data into temporary tables and then perform inserts and updates to move this data where I need it, which works fine. Recently, I just switched it to use the SqlTransaction object in the .NET 2.0 Framework. At first, I did not noticed any problems with this change.

Problem:

After testing some small and medium sized datasets, I noticed the update and insert commands were a little slower than without the transaction. I moved the Bulkloads outside of the transaction (that process is handled before the transaction takes place and the temp tables are deleted after the transaction is committed) and that provided a good enough speed boost to make the transfer fairly fast again. Recently, I tried loading a data set that contained 200,000 rows whose fields would be inserted into 2 tables.  To be more specific, 2 of the fields in the 200,000 rows go into one table (Table1) and 40 of the fields of the 200,000 rows go into another table (Table2). I bulkload the data into the temporary table consisting of all the fields and the entire 200,000 rows. I then start a transaction, turn off all triggers on the tables, and then perform the insert commands. The insert of the 2 fields into Table 1 is very fast. Then I perform the insert into Table 2 (using a field from Table 1 through a join also). This insert never ends. I've let it sit there for over 8 hours and it does nothing. I have run a few benchmarks and realized that doing this with just a few thousand rows requires a large amount of time to complete. If I do all of this without a transaction, it runs fine and takes less than 5 minutes. If I do this all in the query analyzer, it runs just as fast. Also, I am about 98% sure this is not an issue with deadlocking since it works on smaller data sets (less than 200 rows) and even on medium-sized data sets (a couple thousand rows), it eventually finishes after a while.

The option of not using a transaction is out of the question as I am doing large inserts of data and if any error occurs, I need all the data to not be committed. I've even gone through and added "WITH (NOLOCK)" and "SET NOCOUNT ON" to all my select statements just to make sure it is as optimized on the SQL server as possible. One more bit of information is that I am doing all the insert commands with the SqlCommand object and the insert commands are using "INSERT INTO" with a "SELECT" statement from the temporary table. Does anyone have any recommendations on how I should go about fixing this?

For an added bonus, after this process finishes, it will then perform an insert of 500,000 rows for another table... I haven't even gotten there yet!

Thanks in advance!
SOLUTION
Avatar of rmacfadyen
rmacfadyen

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of RyGiL
RyGiL

ASKER

rmacfadyen:
I agree that it sounds like a deadlocking issue but I have tried a few things that I believe prove it is not. The insert command is in this format: "INSERT [TABLE1] INTO (field1,field2,field3,...) SELECT (field1,field2,field3,...) FROM TempTable". I have changed the select statement to benchmark/test a few things by setting it to "SELECT TOP 100" and "SELECT TOP 1000". So this tells me that there is no deadlocking since smaller sets of the data can be inserted.

As far as batch inserts go, I would like to handle it all in one big transaction for error catching purposes. I use the worst case scenario to explain why I want to do this. The worst case scenario is that the connection is broken from the SQL server for some reason and cannot be re-established. With batch inserts, this means that only a portion of the data was inserted (and committed) so the users would be stuck with erroneous data. If they ran the import again, it would result in all that data being duplicated. If I do it in one transaction and the connection breaks, nothing is written because nothing is committed.

anyoneis:
I am simply opening the connection (SqlConnection), beginning a transaction (SqlTransaction), and issuing insert commands (SqlCommand).

imran_fast:
I tried that at about midnight last night and am still having the same problem, I set the mdf to 2 gigs and the ldf to 10 gigs. I also set the recovery model to simple. It performed just like before.


----------------------------------
Am I asking for too much for a transaction to handle? It baffles my mind that it works perfectly without a transaction or even when using a transaction, the longest amount of time is the actual commands and not the commit itself. As far as batch inserts go as rmacfadyen mentioned, would it be possible to do each insert one row at a time but keep it all in the transaction? I would imagine this would take even longer.



Avatar of RyGiL

ASKER

Here's an update on some more tests I have run. Apparently, the bigger problem here is fact that I am inserting 40 columns in all 200,000 rows. I reduced the column to just 17 and it ran fine. I even tried my data set with 500,000 rows and only 5 columns and it ran fast as well. Would it be faster to do a small insert of just a few columns and then turn around and update those rows with the rest of the columns? Why?
ASKER CERTIFIED SOLUTION
Avatar of Anthony Perkins
Anthony Perkins
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of RyGiL

ASKER

Welp, after extensive testing, I concluded that it was the amount of columns being inserted that causes the slowdown, NOT the amount of rows. Certainly, the amount of rows does factor in but not nearly as much as the amount of columns being inserted.  

To get around this, I basically do an insert with the first column and a "temporary" column I created to keep track of which rows to use in the updates (mentioned in next sentence). I then do a loop (C#) where I update 5 columns or less each (using the "temporary" column as a reference point in the where clause) until all 40 columns are updated. This handles it when the data set has 4 columns or 400 columns. It only takes about 20-30 seconds per set of 5 columns for the entire update for the SQL servers on my local machine. I even tried the same thing over the internet (worse case as this will usually only be used on an external SQL server on a LAN) and it only took about 5 minutes for all 40 columns.
How odd.

It might worth doing some system monitoring to figure out where SQL was spending its time. I'm wondering if this is an artifact of your test environment... something to do with the actual mdf/ldf files (eg. autogrow). If your production environment is considerably different (multple MDF's, different/better disk subsystem, more cache, etc etc etc) trying to determine why you're getting the results you're getting (with the original method). PerfMon should help pin this down... and perhaps a call to MS tech support.

Rob
Fine with me.

Rob
Fine with me.

David
Avatar of RyGiL

ASKER

One more thing to note for my solution... I finally found a distinct reason why it works on some machines and not others. I got to work today and the process did not work on our SQL server. I tested it on about 3 other machines and only one worked. After hours of constant tweaking to the insert and update commands and still getting the same result, I took a different approach in finding the differences in the machines. Come to find out, it works on my local machine and another machine on the network that both have 2GB of RAM. It also worked on a machine that had 6 GB RAM. The systems it did not work had 1GB of RAM or less. The odd thing is, the task manager's mem usage for the sqlserver.exe never goes above a hundred megs of ram. After playing with all the options on the SQL server (MDF size, LDF size) and not getting anywhere, I have decided to split the mass insert/update sql command into reading a single row at a time and inserting it.

Yes this process is painfully slow and quite sloppy BUT it works. This new method doesn't care what kind of RAM is on the machine the SQL server is hosted on.

Closed, 500 points refunded.
Netminder
Site Admin
Avatar of RyGiL

ASKER

I didn't want the points refunded, I wanted them reduced and split among everyone.
Avatar of RyGiL

ASKER

Thanks!