?
SolvedPrivate

Delete from table Index strategy

Posted on 2014-02-06
3
Medium Priority
?
58 Views
Last Modified: 2016-02-10
Hello,

I have a query that looks like this:

DELETE FROM dbo.fact_EmailActivity
      WHERE wk_EmailActivity IN (SELECT wk_EmailActivity FROM stg_fact_emailActivity_process)
      

There are several problems, the table has more than 250M rows, 10 columns, 9 non clustered index and one clustered index (PK)

Is it better for me to do a delete based on a join? or use the sub query?

aslo, should i drop all the index and recreate ?

One thing i didn't mention, is that i execute this inside of an ssis pacakge loop, so i do it 100,000 rows at a time, then get the next 100K

It's taken over 23 hours, and it looks like it's doing about 1M rows per hour.there are a few other steps in the package, however, i'm positive that this is the biggest bottle neck .I don't know why the table was designed with so many indices. However, it seems like even if i drop the indices, rebuilding them, would take equally as long....

Any help, greatly appreciated, to even cut this time in half....
0
Comment
Question by:Auerelio Vasquez
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 75

Assisted Solution

by:Aneesh Retnakaran
Aneesh Retnakaran earned 1000 total points
ID: 39839140
>Is it better for me to do a delete based on a join? or use the sub query?
both methods will have the same performance. You need to make sure that there is an idex on dbo.fact_EmailActivity.wk_EmailActivity
> is that i execute this inside of an ssis pacakge loop, so i do it 100,000 rows at a time, then get the next 100K

deleting in batches is the best option as it wont lock the tables. On a heavily used table, if you delete millions of rows, its gonna create a lock on the table and applications may time out. So I suggest you delete them in batches

http://sequelserver.blogspot.ca/2013/02/update-millions-of-rows.html


If you have to delete majority of records from that table, you can actually insert the rest of the data into a new table and drop the old one. You may need to plan this action ahead so that the application wont insert anything into the original table while you do the insert.
0
 
LVL 69

Accepted Solution

by:
Scott Pletcher earned 1000 total points
ID: 39839290
Definitely create an index on dbo.fact_EmailActivity ( wk_EmailActivity ), but you must also look at the query plan and verify that it is using that index for the DELETE.

Also, Cluster table stg_fact_emailActivity on wk_EmailActivity, just in case.  It may not help much, but it definitely won't hurt.


>> I don't know why the table was designed with so many indices. <<

Probably the wrong clustering column.  By default, id/identity is usually used for clustering, and even some experts very strongly push that idea.  But that often ends up resulting in vastly worse performance overall, in massive extra I/O for standard table joins and because of the large number of covering, nonclustered indexes that have to be built on the table.
0
 
LVL 1

Author Closing Comment

by:Auerelio Vasquez
ID: 39840043
Thanks for the help guys. i went a slightly different way,

i did the following for the first historical delete. after that, it's a much smaller data set. so the package runs alot smoother, something that was taking more than 24 hours to run, (at 1m per hour, it would have taken 100 hours to complete this), it's completing in less than one hour.

/*1. create a new table that is structurally identical to the original. Fact_emailActivity_New
2. create the clustered index
3. insert the data from fact_emailActivity into the new table (that we want to keep where fk_SentDate > 8036)
4. create the non-clustered indexes on the new table
5. after verifying the data in the new table, re-name the original table to fact_EmailActivity_old
6. rename the new table to fact_EmailActivity
7. select the data we want to archive into fact_EmailActivity_arc
 (could also create a recordset in ssis, with the desired rows,
 where fk_SentDate < 8036)
8. move data from fact_EmailActivity_arc to EDWArchive (arc_Fact_EmailActivity)
9. verify data, then truncate fact_EmailActivity_OLD
10. drop fact_EmailActivity_OLD and fact_EmailActivity_arc
*/
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Ready to improve network connectivity? Watch this webinar to learn how SD-WANs and a one-click instant connect tool can boost provisions, deployment, and management of your cloud connection.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Why is this different from all of the other step by step guides?  Because I make a living as a DBA and not as a writer and I lived through this experience. Defining the name: When I talk to people they say different names on this subject stuff l…
Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.
Suggested Courses

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question