• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1166
  • Last Modified:

Most efficient way to purge child table records

I have a business process that inserts parent records into a queue table.  Multiple detail records are written to two child tables.  Once the item is processed by the business, the parent record is deleted and inserted into a log table.  The detail records remain in their child tables.  We run a nightly job that purges the log table and all related detail records.  The log table is very large and this process takes about 45 minutes to complete.  Previously we were purging all records older than 30 days.  I was using the smallest key in the parent table to delete all records in the child tables with a key less than the parent key.  This seemed to work well since it used an indexed column to determine the delete criteria and avoided any table joins.  

We now have a new requirment to purge log records based on a client's retension criteria.  Most records we still purge after 30 days, however, some records we now keep 60 or 90 days.  My previous method of using the smallest key no longer works since there will be gaps in the data.  I can't use a cascade delete since the primary key is shared between the two queue and log tables.  I could use something similar to the following, but given the size of the log table it seems very inefficient.

DELETE FROM TRANSACTION_LOG_PROPERTIES
          WHERE TRANSACTION_ID IN
              ( SELECT TRANSACTION_ID FROM TRANSACTION_LOG
              WHERE TRANS_COMPLETE_DATE < cutOfDate);

What is the best way to clean up my detail records?  Thanks.  Also, we are running an Oracle 10g database.
0
trudyhlittle
Asked:
trudyhlittle
1 Solution
 
PilouteCommented:
Hi,

If I understand your problem, you'd like to delete records from TRANSACTION_LOG_PROPERTIES when "parent" lines are deleted from TRANSACTION_LOG. Right ?

And you can not use anymore the criteria TRANSACTION_ID, with DELETE where TRANSACTION_LOG_PROPERTIES.TRANSACTION_ID <  min( TRANSACTION_LOG.TRANSACTION_ID) because now you have gaps in the TRANSACTION_ID list ? Right ?

If this is also right, then I suggest the following :

1) delete first the lines in TRANSACTION_LOG.

2) then
          DELETE FROM TRANSACTION_LOG_PROPERTIES sub
          WHERE not exists (select 0 from TRANSACTION_LOG main
              WHERE main.TRANSACTION_ID = sub.TRANSACTION_ID);

'not exists' will use the index of your main table (TRANSACTION_ID) instead of browsing it completely like your 'in' statement would do.

Cheers,
P
0
 
MikeOM_DBACommented:

Define the foreign key with the 'ON DELETE CASCADE' option as on this example from the fine OracleĀ® Database SQL Reference.
 
0
 
trudyhlittleAuthor Commented:
MikeOM_DBA, defining a ON DELETE CASCADE is not an option since records are inserted into the child tables at the time the queue recored is created, not when the log records are created.  I can't set up a foreign key relationship since records need to exist in the foreign key tables before they are created in the primary table I am purging

Piloute, thanks for you're suggestion.  I'll implement the change for tonight's purge on our test system and see how it performs.  Our test sytem doesn't contain nearly the volume of data our production environment does, but at least I should be able to get some metrics.
.  
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
trudyhlittleAuthor Commented:
I just thought of a problem with the suggested SQL.  In my previous processing I was getting the min(transaction_id) from both the queue and log tables and using whichever was the smallest to preform the delete on the child tables.  I can't use
DELETE FROM TRANSACTION_LOG_PROPERTIES sub
          WHERE not exists (select 0 from TRANSACTION_LOG main
              WHERE main.TRANSACTION_ID = sub.TRANSACTION_ID);
since there may be records still in the queue table that haven't been moved over to the log table yet for which I don't want to delete the child records.  I need to purge the child records where the record doesn't exist in either the transaction_queue or the transaction_log table.
0
 
PilouteCommented:
Hi,

OK. Then I have 2 suggestions :

1) either you test the exec speed of the "not exists" version, and if it is OK, you add  " and not exists (select from queue_table)" to your statement. It will still run faster than the initial statement you were thinking of.

2) or (and this will for sure be really faster) :

a) you create a one column 'temporary' table by doing a select union between the trasaction_ids of the two main tables. A select should be quick enough. So you'll have a single table containing everything that you don't want to get rid of.

a2) cascading and linking to the main tables is not a bad idea either. You could fill in data and delete into the 'temp' table with triggers. This would be useful if you have billions of lines to get with the select in point a) and you think it would be too long to run.

b) now you just run the query I have first suggested by checking the existence of the transaction_id on this new temp table. This will be (for sure) faster than doing it separately on both the main tables.

Cheers,
P
0
 
MilleniumaireCommented:
You can create deferred constraints which means the validation isn't performed until the transaction ends, usually at commit.
0
 
trudyhlittleAuthor Commented:
Thanks Piloute.  I used the 2nd recommendation, but reversed the data in the temporary table.  Instead of loading the table with keys to keep, I loaded them before the delete with keys I am going to delete then used a join on the temporary table to my child tables to delete all records that exist.

DELETE FROM TRANSACTION_LOG_PROPERTIES sub
          WHERE exists (select 0 from TEMP_TRANS_IDS_TO_DELETE main
              WHERE main.TRANSACTION_ID = sub.TRANSACTION_ID);
0

Featured Post

Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

Tackle projects and never again get stuck behind a technical roadblock.
Join Now