The purge of data

Dear all,

Right now we try to purge data but we find the process is very slow.

Our query is :
===================================================================
DECLARE

@NoOfMonthRetention      INT,@BusinessDate DATETIME

DELETE      tk FROM  task tk
WHERE user_activity_log_identifier IN
(SELECT  user_activity_log_identifier FROM  user_activity_log
WHERE  action_time      < DATEADD(MM, -@NoOfMonthRetention, @BusinessDate) )

=====================================================================

The execution plan of this is (see attached)

what I see is it delete something where xxxx in a result set from a select statement, it should be a expensive operation, how to solve it?

use more temp table as we use less join? but more temp will create more loading on tempDB, right? This is another problem.

DBA100.
execution-plan-before-change-1.jpg
execution-plan-before-change-2.jpg
execution-plan-before-change-3.jpg
execution-plan-before-change-4.jpg
LVL 1
marrowyungSenior Technical architecture (Data)Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

AnujSQL Server DBACommented:
Create a clustered index on user_activity_log , the index key should be your primary key to avoid heap lookup, this improves performance for all queries refering user_activity_log table.

Another option is to create a covered index, make sure that the index is
CREATE INDEX IX_NCL_UserActicityLog_ActionTime
ON user_activity_log (action_time)
Include(user_activity_log_identifier)

OR

CREATE INDEX IX_NCL_UserActicityLog_ActionTime
ON user_activity_log (action_time,user_activity_log_identifier)
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Deepak ChauhanSQL Server DBACommented:
Hi,

U doesn't need to create index if your table have minimal select operation.
u can use temp table like this

declare
@NoOfMonthRetention      INT,@BusinessDate DATETIME

SELECT  user_activity_log_identifier into ##t1 FROM  user_activity_log
WHERE  action_time      < DATEADD(MM, -@NoOfMonthRetention, @BusinessDate
go
DELETE    FROM  task
WHERE user_activity_log_identifier in ( select user_activity_log_identifier from ##t1)

or
u can use cte
0
rajeevnandanmishraCommented:
Hi,
If index doesn't make you happy, then you can use below code. It will definitely improve the performance.
---------
DECLARE @NoOfMonthRetention INT, @BusinessDate DATETIME

SELECT  user_activity_log_identifier into #myTemp FROM  user_activity_log
WHERE  action_time      < DATEADD(MM, -@NoOfMonthRetention, @BusinessDate)

set rowcount 1000

while 1=1
begin
      DELETE   tk
      FROM  task tk, #myTemp tp
      WHERE tk.user_activity_log_identifier = tp.user_activity_log_identifier

      if @@rowcount = 0
            break

end

--------
0
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

marrowyungSenior Technical architecture (Data)Author Commented:
anujnb,

like this:

CREATE CLUSTERED INDEX [Newindex_user_activity_log] ON [dbo].[user_activity_log]
(
      [user_activity_log_identifier] ASC
)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

exec sp_updatestats

Why index key should be your primary key to avoid heap lookup, what is heap lookup? cluseted index is the heap and if we defined clustered index on primary key it will do clustered index scan/lookup index?

deepakChauhan,

you mean load the result set to temp table, which is in tempDB, then we delete the object?

What is cte?

rajeevnandanmishra,

This example use more code to run than what deepakChauhan provides, it will run slower, right? you both insert the value in a temp table.

so you both prefer the IN statement in the tempDB, right?

But this will create contention in TempdB itself, right? and the size of TempdB will growth if this operation is a daily operation.

DBA100.
0
rajeevnandanmishraCommented:
Hi,
My suggestion may looks like using a little more code. But it is actually it is going to run fast.
Not much, but a few points that i like to mention:

1. filtered data from user_activity_log has to be created (either via Index in the current DB, or via IN query in the tempdb, or via CTE or via TEMP Table in tempdb). So, it will not make too much of difference (unless you have very small amount of data that needed to be purged).

2. It must be noticed that, each DML run in an individual TRANSACTION. So, the size of transaction log will keep on increasing till it doesn't completes it. So, putting a limit on the affected rows, we are actually going to reduce the size of transaction log and locking period also. This will reduce the overall timing of your operation.
0
marrowyungSenior Technical architecture (Data)Author Commented:
by: anujnb,

After creating the clustered index:

CREATE CLUSTERED INDEX [_dta_index_user_activity_log_c_5_773577794__K1] ON [dbo].[user_activity_log]
(
      [user_activity_log_identifier] ASC
)WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [PRIMARY]

exec sp_updatestats

Please result is attached.

Question is : why there are so many process called parallelism just after adding one more index can the cost is 0, which is good.

Hash match is as powerful match and even it is costing 33%, but it should be ok, right?

But the clustered index delete is increase from 5% to 10%, not good ?

I have updated the client statistics picture from SSMS so that you all can see the before and after statistics.
execution-plan-after-change-1.jpg
execution-plan-after-change-2.jpg
execution-plan-after-change-3.jpg
execution-plan-after-change-4.jpg
client-statisitcs.jpg
0
marrowyungSenior Technical architecture (Data)Author Commented:
rajeevnandanmishra,

"It must be noticed that, each DML run in an individual TRANSACTION. So, the size of transaction log will keep on increasing till it doesn't completes it. So, putting a limit on the affected rows, we are actually going to reduce the size of transaction log and locking period also. This will reduce the overall timing of your operation. "

that's why the loop count is 1000, right ?


still wondering why first insert to temp table will be faster.

deepakChauhan,

your script return error:

declare
@NoOfMonthRetention      INT,@BusinessDate DATETIME

SELECT  user_activity_log_identifier into ##t1 FROM  user_activity_log
WHERE  action_time      < DATEADD(MM, -@NoOfMonthRetention, @BusinessDate
go
DELETE    FROM  task
WHERE user_activity_log_identifier in ( select user_activity_log_identifier from ##t1)

it said :

Msg 102, Level 15, State 1, Line 5
Incorrect syntax near '@BusinessDate'.
Msg 208, Level 16, State 0, Line 1
Invalid object name '##t1'.
0
marrowyungSenior Technical architecture (Data)Author Commented:
from my point of view that, first insert one more tempdb but do the same  WHERE XXXX IN (select from temp table) do one more thing, so it should be slower.
0
rajeevnandanmishraCommented:
@marrowyung:
Yes, by putting the rowcount to 1000, we are minimizing the total transaction size. This will minimize the size allocation required in tempdb & transaction log. So, performance point of view this is definitely going to be faster.

Sorry, but i am missing the original question. What do you want to know in this part?
0
marrowyungSenior Technical architecture (Data)Author Commented:
rajeevnandanmishra,

Do just do it 1000 times? and no more ? it make sure everything in temp table and the task table deleted ?

DBA100.
0
rajeevnandanmishraCommented:
@marrowyung:

I have assumed that you will right the given code in a stored procedure. In that case the while loop will continue deleting the data, till it exists. Also, the temp table will be removed automatically by sql server after the execution of procedure. Though you can always drop the temp table by putting
drop table #myTemp
in the last.
The number1000, is just an example. Depending on the size of row, this value can be changed. When i use, i try to make my data deletion not of more than an MB (that goes normally around 1000 rows).
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.