• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 312
  • Last Modified:

The purge of data

Dear all,

Right now we try to purge data but we find the process is very slow.

Our query is :
===================================================================
DECLARE

@NoOfMonthRetention      INT,@BusinessDate DATETIME

DELETE      tk FROM  task tk
WHERE user_activity_log_identifier IN
(SELECT  user_activity_log_identifier FROM  user_activity_log
WHERE  action_time      < DATEADD(MM, -@NoOfMonthRetention, @BusinessDate) )

=====================================================================

The execution plan of this is (see attached)

what I see is it delete something where xxxx in a result set from a select statement, it should be a expensive operation, how to solve it?

use more temp table as we use less join? but more temp will create more loading on tempDB, right? This is another problem.

DBA100.
execution-plan-before-change-1.jpg
execution-plan-before-change-2.jpg
execution-plan-before-change-3.jpg
execution-plan-before-change-4.jpg
0
marrowyung
Asked:
marrowyung
3 Solutions
 
AnujCommented:
Create a clustered index on user_activity_log , the index key should be your primary key to avoid heap lookup, this improves performance for all queries refering user_activity_log table.

Another option is to create a covered index, make sure that the index is
CREATE INDEX IX_NCL_UserActicityLog_ActionTime
ON user_activity_log (action_time)
Include(user_activity_log_identifier)

OR

CREATE INDEX IX_NCL_UserActicityLog_ActionTime
ON user_activity_log (action_time,user_activity_log_identifier)
0
 
Deepak ChauhanSQL Server DBACommented:
Hi,

U doesn't need to create index if your table have minimal select operation.
u can use temp table like this

declare
@NoOfMonthRetention      INT,@BusinessDate DATETIME

SELECT  user_activity_log_identifier into ##t1 FROM  user_activity_log
WHERE  action_time      < DATEADD(MM, -@NoOfMonthRetention, @BusinessDate
go
DELETE    FROM  task
WHERE user_activity_log_identifier in ( select user_activity_log_identifier from ##t1)

or
u can use cte
0
 
rajeevnandanmishraCommented:
Hi,
If index doesn't make you happy, then you can use below code. It will definitely improve the performance.
---------
DECLARE @NoOfMonthRetention INT, @BusinessDate DATETIME

SELECT  user_activity_log_identifier into #myTemp FROM  user_activity_log
WHERE  action_time      < DATEADD(MM, -@NoOfMonthRetention, @BusinessDate)

set rowcount 1000

while 1=1
begin
      DELETE   tk
      FROM  task tk, #myTemp tp
      WHERE tk.user_activity_log_identifier = tp.user_activity_log_identifier

      if @@rowcount = 0
            break

end

--------
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
marrowyungAuthor Commented:
anujnb,

like this:

CREATE CLUSTERED INDEX [Newindex_user_activity_log] ON [dbo].[user_activity_log]
(
      [user_activity_log_identifier] ASC
)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

exec sp_updatestats

Why index key should be your primary key to avoid heap lookup, what is heap lookup? cluseted index is the heap and if we defined clustered index on primary key it will do clustered index scan/lookup index?

deepakChauhan,

you mean load the result set to temp table, which is in tempDB, then we delete the object?

What is cte?

rajeevnandanmishra,

This example use more code to run than what deepakChauhan provides, it will run slower, right? you both insert the value in a temp table.

so you both prefer the IN statement in the tempDB, right?

But this will create contention in TempdB itself, right? and the size of TempdB will growth if this operation is a daily operation.

DBA100.
0
 
rajeevnandanmishraCommented:
Hi,
My suggestion may looks like using a little more code. But it is actually it is going to run fast.
Not much, but a few points that i like to mention:

1. filtered data from user_activity_log has to be created (either via Index in the current DB, or via IN query in the tempdb, or via CTE or via TEMP Table in tempdb). So, it will not make too much of difference (unless you have very small amount of data that needed to be purged).

2. It must be noticed that, each DML run in an individual TRANSACTION. So, the size of transaction log will keep on increasing till it doesn't completes it. So, putting a limit on the affected rows, we are actually going to reduce the size of transaction log and locking period also. This will reduce the overall timing of your operation.
0
 
marrowyungAuthor Commented:
by: anujnb,

After creating the clustered index:

CREATE CLUSTERED INDEX [_dta_index_user_activity_log_c_5_773577794__K1] ON [dbo].[user_activity_log]
(
      [user_activity_log_identifier] ASC
)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

exec sp_updatestats

Please result is attached.

Question is : why there are so many process called parallelism just after adding one more index can the cost is 0, which is good.

Hash match is as powerful match and even it is costing 33%, but it should be ok, right?

But the clustered index delete is increase from 5% to 10%, not good ?

I have updated the client statistics picture from SSMS so that you all can see the before and after statistics.
execution-plan-after-change-1.jpg
execution-plan-after-change-2.jpg
execution-plan-after-change-3.jpg
execution-plan-after-change-4.jpg
client-statisitcs.jpg
0
 
marrowyungAuthor Commented:
rajeevnandanmishra,

"It must be noticed that, each DML run in an individual TRANSACTION. So, the size of transaction log will keep on increasing till it doesn't completes it. So, putting a limit on the affected rows, we are actually going to reduce the size of transaction log and locking period also. This will reduce the overall timing of your operation. "

that's why the loop count is 1000, right ?


still wondering why first insert to temp table will be faster.

deepakChauhan,

your script return error:

declare
@NoOfMonthRetention      INT,@BusinessDate DATETIME

SELECT  user_activity_log_identifier into ##t1 FROM  user_activity_log
WHERE  action_time      < DATEADD(MM, -@NoOfMonthRetention, @BusinessDate
go
DELETE    FROM  task
WHERE user_activity_log_identifier in ( select user_activity_log_identifier from ##t1)

it said :

Msg 102, Level 15, State 1, Line 5
Incorrect syntax near '@BusinessDate'.
Msg 208, Level 16, State 0, Line 1
Invalid object name '##t1'.
0
 
marrowyungAuthor Commented:
from my point of view that, first insert one more tempdb but do the same  WHERE XXXX IN (select from temp table) do one more thing, so it should be slower.
0
 
rajeevnandanmishraCommented:
@marrowyung:
Yes, by putting the rowcount to 1000, we are minimizing the total transaction size. This will minimize the size allocation required in tempdb & transaction log. So, performance point of view this is definitely going to be faster.

Sorry, but i am missing the original question. What do you want to know in this part?
0
 
marrowyungAuthor Commented:
rajeevnandanmishra,

Do just do it 1000 times? and no more ? it make sure everything in temp table and the task table deleted ?

DBA100.
0
 
rajeevnandanmishraCommented:
@marrowyung:

I have assumed that you will right the given code in a stored procedure. In that case the while loop will continue deleting the data, till it exists. Also, the temp table will be removed automatically by sql server after the execution of procedure. Though you can always drop the temp table by putting
drop table #myTemp
in the last.
The number1000, is just an example. Depending on the size of row, this value can be changed. When i use, i try to make my data deletion not of more than an MB (that goes normally around 1000 rows).
0

Featured Post

Visualize your virtual and backup environments

Create well-organized and polished visualizations of your virtual and backup environments when planning VMware vSphere, Microsoft Hyper-V or Veeam deployments. It helps you to gain better visibility and valuable business insights.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now