Solved

Optimize large delete and insert operations with large rows

Posted on 2004-09-21
5
254 Views
Last Modified: 2010-08-05
I have a table with around 4.8 million records, and about 65 columns each record.  On a daily basis, I need to upload data from another source to this table.  Data in the original source can change up to 13 months back, so I bring only the latest 13 months from the source into a holding table (replacing the data in it every day).  I need to add/replace data in the "production" table with the new data from the holding table.  Because there are so many columns and any of them can change, there is no easy way to determine which rows have changed and only update those; so I am simply deleting all rows in the production table that fall within the date range existing in the holding table, and then inserting everything from the holding table into it.  Here's the problem.

Deleting the data from the production table takes a long time (over 20 minutes).  On average, about 760,000 records need to be deleted from this 4.8 million record table, which has several indexes.  Is there a way to delete records faster or without loggin?  Once deleted, I don't need to get them back.  A partial truncate would be ideal, but that does not exist.  I tried copying the remaining records to another table, then truncating it, then inserting from the other table and from the holding table to complete the set, but that was not any faster.  The delete looks something like this:

DECLARE @firstDate datetime, @lastDate datetime,

SET @firstDate = (SELECT min(the_date) FROM Holding_Table)
SET @lastDate = (SELECT max(the_date) FROM Holding_Table)

--delete records in Production table that exist in holding table
DELETE Production_Table
WHERE the_date BETWEEN @firstDate AND @lastDate --takes about 26 mins

--insert records from holding table into production table, which has additional fields
INSERT INTO Production_Table
SELECT *, null, null, null, null, null, null, null, dbo.getQuarter(the_date),null
FROM Holding_Table  --takes about 12 mins

Any suggestions?  I have indexes on the date fields for both tables (among others that are used for queries).

After this, I have update queries that update the null values inserted above, based on joins with other tables, which also are taking a long time.

Thanks!
0
Comment
Question by:cheluto2
5 Comments
 
LVL 5

Assisted Solution

by:hkamal
hkamal earned 25 total points
ID: 12113463
I recommend replacing the BETWEEN in your delete with >= AND <= as this performs faster.
Also, are the indexes on your table clustered? This could really impact deletes and inserts. Experiment with dropping the index and re-creating it after the operation to see if that's any quicker.
0
 
LVL 42

Expert Comment

by:EugeneZ
ID: 12114447
when did you last time rrefreshed indexes\ update statistics- i
Fresh indexes  can speed up you query
0
 
LVL 69

Accepted Solution

by:
ScottPletcher earned 100 total points
ID: 12115585
You should very probably cluster the table on "the_date" column, especially if your queries usually specify a date or date range, which I suspect they do.  NOTE: If this is not currently the case, it might take some time to put this clustered index in place, especially since all the other indexes will need modified/rebuilt if you create a new clustered index.

Also, during the delete and re-insert process, you could try using either:
bulk-logged recovery mode

or, if you can get away with it (for example, you have a full backup before the process starts):
simple recovery mode.

Then, when the updates are finished, take a full backup and go back to full recovery mode.
0
 
LVL 1

Author Comment

by:cheluto2
ID: 12164383
Thanks for the responses, and sorry it took me so long to get back to you.

hkamal:  I changed the condition to use >= and <=, but I did not see a noticeable difference.  However, thanks for the tip.  The table has a clustered index on the date field already, and other indexes on other fields, too.

ScottPletcher:  I just added the change in recovery mode as you recommend.  I will know by tomorrow (after the process runs tonight) what effect it had on the process, but it sounds like it might be a good improvement, so thanks in advance.  We do a full backup every other day, and the data is not updated except for this one process that runs once a day, so even if we lose a day's worth of data, we can recover it easily.  I will keep you posted tomorrow.
0
 
LVL 1

Author Comment

by:cheluto2
ID: 12179828
The problem appears to be with the number of indexes on the large table, more than anything else.  I did apply Scott's suggestion of changing the recovery mode, and it seemed to speed it up a bit, so I am marking that as the accepted answer, and also gave part of the points to hkamal for his tip.  I will have to keep working on the indexes to get them to where they need to be, using the Index Tuning Wizard and common sense.
Thanks!
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

I wrote this interesting script that really help me find jobs or procedures when working in a huge environment. I could I have written it as a Procedure but then I would have to have it on each machine or have a link to a server-related search that …
Introduction In my previous article (http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/A_9150-Loading-XML-Using-SSIS.html) I showed you how the XML Source component can be used to load XML files into a SQL Server database, us…
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now