SQL statement to delete records from a table based on date range.

niceoneishere
niceoneishere used Ask the Experts™
on
I my website is based on sql server 2005 and ASP.NET 2.0. (C#) I track all the website errors thru a table called Exceptions and this table has now more than 200000 records.

I trying to clean it up a little. basically remove all the records from the years 2007 till 2011.

how can i do this without impacting the site as well as the database.

my table is as follows

USE [MsiteDB]
GO
/****** Object:  Table [dbo].[ExceptionLog]  *****/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[ExceptionLog](
	[Id] [int] IDENTITY(1,1) NOT NULL,
	[ExceptionMessage] [varchar](max) NOT NULL,
	[ExceptionTime] [datetime] NOT NULL,
 CONSTRAINT [PK_ExceptionLog] PRIMARY KEY CLUSTERED 
(
	[Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
SET ANSI_PADDING OFF

Open in new window


Thanks and appreciate it
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Scott PletcherSenior DBA
Most Valuable Expert 2018
Top Expert 2014

Commented:
>> basically remove all the records from the years 2007 till 2011. <<

Do you want to remove through 2011 or through 2010?


1) Convert the date range to an Id range, since id is the clustering key
2) delete in batches, to reduce locks

DECLARE @ending_id int
DECLARE @max_rows_to_delete_at_one_time int
DECLARE @rows_deleted int

SELECT @ending_id = MAX(id)
FROM dbo.ExceptionLog WITH (NOLOCK)
WHERE
    ExceptionTime < '20110101'  --<< remove all thru 2010; change to '20120101' to remove all of 2011 also
What's wrong with running DELETE statement on all records in the table where Year is between 2007 and 2011?
Scott PletcherSenior DBA
Most Valuable Expert 2018
Top Expert 2014

Commented:
DECLARE @max_rows_to_delete_at_one_time int
DECLARE @rows_deleted int

SET @max_rows_to_delete_at_one_time = 5000
SET @rows_deleted = @max_rows_to_delete_at_one_time

WHILE @rows_deleted > 0
BEGIN
    DELETE TOP (@max_rows_to_delete_at_one_time)
    FROM dbo.ExceptionLog
    WHERE id < @ending_id
    SET @rows_deleted = @@ROWCOUNT
END --WHILE
How to Generate Services Revenue the Easiest Way

This Tuesday! Learn key insights about modern cyber protection services & gain practical strategies to skyrocket business:

- What it takes to build a cloud service portfolio
- How to determine which services will help your unique business grow
- Various use-cases and examples

DELETE FROM ExceptionLog WHERE YEAR(ExceptionTime) BETWEEN '2007' AND '2011'

Author

Commented:
I just worried that If i straight out do the delete command it might hang up the site. I was wondering If I can delete between the date exception time range and say 10000  records at a time

Thanks
Scott PletcherSenior DBA
Most Valuable Expert 2018
Top Expert 2014

Commented:
RE-WORDING:

When I said:
>> 2) delete in batches, to reduce locks <<

I mean reduce locks that must be *held at the same time*.

If you just delete all rows in one statement, you'll likely lock the table.  Also, the transaction log for that db will have to grow to hold ALL the deleted data.

By doing it in batches, you can lessen the overall impact on that db and the system.

If the log on that db is not very large, expand the log the full amount needed yourself BEFORE running the DELETEs.  Automatic log expansion is a lot of overhead and causes more delays in the db than doing it yourself.

Author

Commented:
Scott, thanks for replying sir, where would i run the code you have posted I can use sql management studio express

thanks and sorry I am little new to this
Put Scott's  code inside a procedure and run it from Management Studio using EXEC statement
If you really want to make sure you don't impact the production data during the business hours then better create a job that will be executed daily at 3:00AM and do this:

DELETE TOP 10000 FROM ExceptionLog WHERE YEAR(ExceptionTime) BETWEEN '2007' AND '2011';

Before that makes sure you create an index on the ExceptionTime column otherwise the delete will scan the table every time executed:


CREATE INDEX IX_ExceptionLog_ExceptionTime ON ExceptionLog (ExceptionTime)

You could use top 1,000 instead of 10,000 but you'd have to make sure the fill up rate of that table is lower than the number you delete daily. You could also schedule the job to execute every 6 hr or so and then for sure you can make the number to delete only 1,000, which will practically have no impact on the table.
Actually in order to make the query use the index you will have to change it to:

DELETE TOP 10000 FROM ExceptionLog WHERE ExceptionTime BETWEEN '2007-01-01' AND '2011-12-31 23:59:59.999';

as BETWEEN includes the values given.

Also if you want to keep this job permanently so it will keep your table clean all the time then you should change the delete to remove dates that are older than a year:

DELETE TOP 1000 FROM ExceptionLog WHERE ExceptionTime < DATEADD(mm, -12, GETDATE());

Again the number has to be grater then the rate you fill the table daily if the job is scheduled daily.
Senior DBA
Most Valuable Expert 2018
Top Expert 2014
Commented:
>> Scott, thanks for replying sir, where would i run the code you have posted I can use sql management studio express <<

Yes, just copy BOTH of my code fragments, one right after the other, into Mgmt Studio Express and run it, like so:


DECLARE @ending_id int
DECLARE @max_rows_to_delete_at_one_time int
DECLARE @rows_deleted int

SELECT @ending_id = MAX(id)
FROM dbo.ExceptionLog WITH (NOLOCK)
WHERE
    ExceptionTime < '20120101'  --<< remove all thru 2011; change to '20110101' to remove only thru 2010

DECLARE @max_rows_to_delete_at_one_time int
DECLARE @rows_deleted int

SET @max_rows_to_delete_at_one_time = 2000  --!!ADJUST AS NEEDED!!
SET @rows_deleted = @max_rows_to_delete_at_one_time

WHILE @rows_deleted > 0
BEGIN
    DELETE TOP (@max_rows_to_delete_at_one_time)
    FROM dbo.ExceptionLog
    WHERE id < @ending_id
    SET @rows_deleted = @@ROWCOUNT
END --WHILE


You want to use the id to control the DELETEs, NOT the date, since the id is the clustering key.

You should still be able to use the table while the DELETEs above are running ... however, since it's express, I changed it to delete only 2,000 rows at a time to be safe.
I still insist on 2 things:

1. You need an index on ExceptionTime even if you use Scott's method.

2. The job solution is simpler and can be scheduled. If you still prefer Scott's method it can be put in a job as well.

A job will take care of the problem permanently.

Author

Commented:
thanks guys I really appreciate it one final question how would I make Scott solution to have both the 1 and 2 of what Zberteoc suggested.

Thanks
Assuming you have the Management Studio:

In order cu create a job, under the server node you expand the SQL agent node and then right click on the Job > New Job.

- General section: you give it a name
- Steps section: click New and give it a name, choose your database and then copy and paste what code you want to execute in the big empty panel. Click OK.
- Schedule section - define how often you need the job to execute, you can choose daily and ans specify a time. You have to give it a name as well before you save by clicking OK.
For SQL agent with SQL express you need this apparently:

http://standalonesqlagent.codeplex.com/
Scott PletcherSenior DBA
Most Valuable Expert 2018
Top Expert 2014

Commented:
A) Express does not have a job scheduler.

B) You don't necessarily need this to automatically run multiple times.

C) You DON'T need a separate index on ExceptionTime -- yes, the table will then have to be scanned ONCE to complete the DELETEs, but that's WAYYYY less overhead than constantly maintaining that index for all rows as they are added.

D) Almost certainly the table would best be *clustered* on ExceptionTime, rather than id, but I won't get into that in any more detail (unless the Author would like me to).  ALL queries against that table would improve and continue to run well; you may not even need to purge the table then.

Author

Commented:
Thanks worked out perfectly, I just removed the 2  lines

DECLARE @max_rows_to_delete_at_one_time int
DECLARE @rows_deleted int

as it was declared twice.

and the query worked fast I reduced the table from 330000 records to 35000 in less than 15 mins.

Thanks
Scott PletcherSenior DBA
Most Valuable Expert 2018
Top Expert 2014

Commented:
I guarantee you, it wouldn't have been nearly as fast DELETEing all rows at one time! :-)

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial