Link to home
Start Free TrialLog in
Avatar of Auerelio Vasquez
Auerelio VasquezFlag for United States of America

asked on

Delete from table Index strategy

Hello,

I have a query that looks like this:

DELETE FROM dbo.fact_EmailActivity
      WHERE wk_EmailActivity IN (SELECT wk_EmailActivity FROM stg_fact_emailActivity_process)
      

There are several problems, the table has more than 250M rows, 10 columns, 9 non clustered index and one clustered index (PK)

Is it better for me to do a delete based on a join? or use the sub query?

aslo, should i drop all the index and recreate ?

One thing i didn't mention, is that i execute this inside of an ssis pacakge loop, so i do it 100,000 rows at a time, then get the next 100K

It's taken over 23 hours, and it looks like it's doing about 1M rows per hour.there are a few other steps in the package, however, i'm positive that this is the biggest bottle neck .I don't know why the table was designed with so many indices. However, it seems like even if i drop the indices, rebuilding them, would take equally as long....

Any help, greatly appreciated, to even cut this time in half....
SOLUTION
Avatar of Aneesh
Aneesh
Flag of Canada image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Auerelio Vasquez

ASKER

Thanks for the help guys. i went a slightly different way,

i did the following for the first historical delete. after that, it's a much smaller data set. so the package runs alot smoother, something that was taking more than 24 hours to run, (at 1m per hour, it would have taken 100 hours to complete this), it's completing in less than one hour.

/*1. create a new table that is structurally identical to the original. Fact_emailActivity_New
2. create the clustered index
3. insert the data from fact_emailActivity into the new table (that we want to keep where fk_SentDate > 8036)
4. create the non-clustered indexes on the new table
5. after verifying the data in the new table, re-name the original table to fact_EmailActivity_old
6. rename the new table to fact_EmailActivity
7. select the data we want to archive into fact_EmailActivity_arc
 (could also create a recordset in ssis, with the desired rows,
 where fk_SentDate < 8036)
8. move data from fact_EmailActivity_arc to EDWArchive (arc_Fact_EmailActivity)
9. verify data, then truncate fact_EmailActivity_OLD
10. drop fact_EmailActivity_OLD and fact_EmailActivity_arc
*/