Avatar of dbaSQL
dbaSQL
Flag for United States of America asked on

A little help cleaning up this archive stored procedure

The procedure below is used to archive data from a TradeEvents database to a TradeEventsArchive database for three tables:  CustomerOrders, TradeEvents and Orders.  The insert and delete is completed on each table fully before moving to the next, and the actions are logged in an ArchiveLog table.

  

It's an old procedure and as the data volumes increase, it does less and less to conttrol the growth on the transaction log.  I am looking to clean things up and optimize, and was hoping for some Expert suggestions to do so.  


The requirements are the same --- insert then delete from each table in sequence, and record the actions in the ArchiveLog table.  I just need some help reducing the log usage and wonder if there is a better way to manage the transactions.


Any help is appreciated.


ALTER PROC usp_archiveData
   @days Int = 24,
   @batch int = 100000,    @maxrunhrs int = 24 AS  BEGIN    SET NOCOUNT on    DECLARE @today DATETime = CONVERT(date, getdate()), @runTerminateTime DateTime = DateAdd(HOUR, @maxrunhrs, GetDate())    IF @days IS NULL OR @days <= 0       SET @days = 24    IF @batch IS NULL or @batch <=0       SET @batch = 10000     DECLARE @DatesToMove TABLE(OrderDate SMALLDATETIME)    INSERT INTO @DatesToMove    SELECT OrderDate FROM (SELECT DISTINCT OrderDate FROM TradeEvents.dbo.Orders) o WHERE DATEDIFF(DAY, OrderDate, @today) > @days    DECLARE @targetDate SMALLDATETIME    DECLARE @minOrderId BIGINT, @maxOrderId BIGINT    DECLARE @startTime DATETIME, @endTime DATETIME     DECLARE @errMessage varchar(max)    DECLARE @minEventId BIGINT, @maxEventId BIGINT     BEGIN TRY            SET @targetDate = (SELECT TOP 1 OrderDate FROM @DatesToMove ORDER BY OrderDate)       WHILE @targetDate IS NOT NULL       BEGIN          SET @startTime = GETDATE()           SELECT @minOrderId=MIN(id), @maxOrderId=MAX(id) FROM TradeEvents.dbo.Orders WITH (INDEX(IX_Orders_DateAccountSystem)) WHERE orderDate = @targetDate             -- insert             SET IDENTITY_INSERT [TradeEventsArchive].[dbo].[CustomerOrders] ON;             INSERT INTO [TradeEventsArchive].[dbo].[CustomerOrders]([ID],[OrderDate],........many columns.......)             SELECT [ID] ,[OrderDate],..........many columns...............             FROM [TradeEvents].[dbo].[CustomerOrders]             WHERE OrderDate = @targetDate             SET IDENTITY_INSERT  [TradeEventsArchive].[dbo].[CustomerOrders] OFF;              -- delete             DELETE FROM [TradeEvents].[dbo].[CustomerOrders]  WHERE OrderDate = @targetDate                 DECLARE @batchMinId BIGINT, @batchMaxId BIGINT          SET @batchMinId = @minOrderId          WHILE @batchMinId <= @maxOrderId          BEGIN             SET @batchMaxId = @batchMinId + @batch             IF @batchMaxId > @maxOrderId             SET @batchMaxId = @maxOrderId             SELECT @minEventId=MIN(Id), @maxEventId=MAX(id) FROM TradeEvents.dbo.TradeEvents with (INDEX(IX_Unique_TradeEvents)) WHERE OrderID BETWEEN @batchMinId AND @batchMaxId             BEGIN TRANSACTION                 -- insert                SET IDENTITY_INSERT [TradeEventsArchive].[dbo].[TradeEvents] ON;                 INSERT INTO [TradeEventsArchive].[dbo].[TradeEvents]([Id],.......many columns........)                SELECT [Id],........many columns..................                FROM [TradeEvents].[dbo].[TradeEvents]                WHERE ID BETWEEN @minEventId AND @maxEventId                SET IDENTITY_INSERT  [TradeEventsArchive].[dbo].[TradeEvents] OFF;                 -- delete                DELETE FROM [TradeEvents].[dbo].[TradeEvents]  WHERE ID BETWEEN @minEventId AND @maxEventId                -- insert                INSERT INTO [TradeEventsArchive].[dbo].[Orders]([Id].........................)                SELECT [Id] ..................                 FROM [TradeEvents].[dbo].[Orders]                WHERE ID BETWEEN @batchMinId AND @batchMaxId                -- delete                DELETE FROM [TradeEvents].[dbo].[Orders]  WHERE ID BETWEEN @batchMinId AND @batchMaxId                                 COMMIT TRANSACTION              SET @errMessage = CAST(GETDATE() AS VARCHAR(50)) + ' min/max:' + CAST(@batchMinId AS VARCHAR(20)) + '/' + CAST(@batchMaxId AS VARCHAR(20))             RAISERROR(@errMessage, 0, 1) WITH NOWAIT             SET @batchMinId = @batchMaxId + 1          END                       DELETE @DatesToMove WHERE OrderDate = @targetDate          SET @endTime = GETDATE()          INSERT INTO dbo.ArchiveLog(OrderDate, MinOrderId, MaxOrderId, [BatchSize], StartTime, EndTime)          VALUES(@targetDate, @minOrderId, @maxOrderId, @batch, @startTime, @endTime)          -- to avoid running too long          if @runTerminateTime < GetDate()             BREAK          SET @targetDate = (SELECT TOP (1) OrderDate FROM @DatesToMove ORDER BY OrderDate)       END    END TRY    BEGIN CATCH       SET @errMessage = CAST(ERROR_LINE() AS VARCHAR(50)) + ' error:' + ERROR_MESSAGE()       ROLLBACK TRANSACTION              INSERT INTO dbo.ArchiveLog(OrderDate, MinOrderId, MaxOrderId, [BatchSize], StartTime, EndTime, Notes)       VALUES(@targetDate, 0, 0, @batch, @startTime, GetDate(),  @errMessage)    END CATCH;    SET NOCOUNT OFF        END GO

Open in new window

DatabasesMicrosoft SQL Server* Microsoft SQL Server 2019

Avatar of undefined
Last Comment
arnold

8/22/2022 - Mon
arnold

I would think copying/duplicating he the data as it is inserted/updated would be a better approach, this way all you need to do is delete them from the live table based on age, status..

Often, when data increases, running the procedure on a more frequent basis pulling in a shorter number of data points.
i.e. get data that has been entered from more than 30 days ago. and run this process every 7 days once caught up, will only deal with 7 days worth of data that needs to be migrated.

not sure the scope and scale and did not fully review the SP..

Are you replicating the same structure from the events to the eventsArchive DB

Which DB has the logArchive or you have one on the Archie that records the addition and the Events that records the deletion?
dbaSQL

ASKER
Copy/duplicating the data as it is inserted is not an option.  The intent is solely to strengthen a routine that has been in place and functioning for a very long time.  

>>not sure the scope and scale and did not fully review the SP..
It is just a matter of inserting data from current TradeEvents tables into archive tables in TradeEventsArcive, on a recurring basis.  After the data is written to the archive tables, it the same data is deleted from the current tables.

>>Which DB has the logArchive or you have one on the Archie that records the addition and the Events that records the deletion?
The ArchiveLog table is in the TradeEvents database, and it logs all actions -- insert and deletion for all three tables.
arnold

I see, for handling the delete, adding a trigger before or on delete, to get the data and write it into the archiveLog/
the trigger should be set to handle the data a row at a time if a delete en-mass is issued.
This will auto record deletes automatically.

How frequently is this SP run?
I see you run it and get data older than 24 days.

if you run it every 7 days, the amount of data to process is less.
Running it on a more frequent basis, will further reduce the per run data.

The oddity, you have a transaction, but you do not check for error to roll back, the presumption seems to be that the thing will go through as to be a symbolic Transactions.



While you ask about this SP, your question includes reference to transaction log size management which can be achieved through a more frequent backup of the transaction log. instead of every 24 hours, you can run the transaction log backup to run every 12,8,6,4 hours etc. This will help manage the log file size.
The suggestion of running the above SP on a more frequent basis could also reduce the reliance and need for the transaction log growth.
Your current window of days is 24 so you wait until the orders have aged 24 days,

is 24 days your min requirement?

Another option deals with dealing with X number of transactions at a time.
i.e. each time it runs, it can handle 100 orders as an example.
once the 100th transaction is handled. the SP exits.

deals with breaking the workload to a smaller subset while running the SP more frequently.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
dbaSQL

ASKER
>>I see, for handling the delete, adding a trigger before or on delete, to get the data and write it into the archiveLog/
the trigger should be set to handle the data a row at a time if a delete en-mass is issued.
This will auto record deletes automatically.

The trigger is very interesting.  Can the trigger be set to delete from current and write into historical table?

>>How frequently is this SP run?
Typically once weekly.

>>The oddity, you have a transaction, but you do not check for error to roll back, the presumption seems to be that the thing will go through as to be a symbolic Transactions.
Can you suggest a better approach?

>>While you ask about this SP, your question includes reference to transaction log size management which can be achieved through a more frequent backup of the transaction log.
True.   I will check the backup log frequency.
arnold

Often restriction cross DB scripting could interfer.
One optionto consider is to use an Stored proceedute on the server to get the data arguments passed from the on delete and insert into the other DB?

Though my thought on the trigger dealt with the logging portion.

The event handler that copies data from this to the archive can be similar on insert or update to get the data in..
Trigger on insert/update would potentially add overhead.

Potentially, running this dP nightly starting right after the currently scheduled date. Could reduce the amount of data that needs to be copied, deleted

Since you deal with deletion, how frequently do you run fragmentation based reorganization/rebuild of indexes in these tables?

As to transaction, before commit test if there were errors,

You have three tables to deal with, are you copying all the data or you can trim some data out?
dbaSQL

ASKER
The archive runs once weekly, on the weekends.  Not worried about cross db scripting interference.

>>One optionto consider is to use an Stored proceedute on the server to get the data arguments passed from the on delete and insert into the other DB?
I am interested in this option, but I am not sure that I know how to code it.

>>Potentially, running this dP nightly starting right after the currently scheduled date. Could reduce the amount of data that needs to be copied, deleted
The working window does not allow it to run during the week.  It has to run once weekly.

>>Since you deal with deletion, how frequently do you run fragmentation based reorganization/rebuild of indexes in these tables?
The procedure runs in a job that manages the indexes after each deletion.

>>You have three tables to deal with, are you copying all the data or you can trim some data out?
It's only a trim.  All of the data is never copied.


I am just trying to improve definition of this procedure in terms of log usage and transaction control.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER CERTIFIED SOLUTION
arnold

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question