Link to home
Start Free TrialLog in
Avatar of sdruss
sdruss

asked on

Oracle PL/SQL for large scale delete

New to Stored Procedure development.  Need example Stored Procedure that deletes rows from table based on a timestamp input parameter.  Table is huge so procedure needs to be extremely efficient as performance is a big issue.  Should rowid be used in the query?
Avatar of Sean Stuber
Sean Stuber

something like this?

no, don't use rowid, just delete the table directly, preferably in one large chunk.
If your undo/rollback won't allow that, then delete in as big a chunk as you can

CREATE OR REPLACE PROCEDURE purge_table(p_retention IN PLS_INTEGER)
IS
    -- set this to as large a number as your rollback/undo will support
    v_delete_limit PLS_INTEGER := 100000;
BEGIN
    IF p_retention > 0
    THEN
        LOOP
            DELETE FROM your_table
             WHERE your_date < TRUNC(SYSDATE) - p_retention
               AND ROWNUM <= v_delete_limit;

            EXIT WHEN SQL%ROWCOUNT = 0;

            COMMIT;
        END LOOP;

        COMMIT;
    END IF;
END purge_table;
sdruss,

you can pass the timestamp as a string and write down the procedure as given below. In the below procedure timestamp format is "YYYYMMDDHH24MISS". Update the format as per your need.

create or replace procedure purge_table(pi_timestring IN varchar2) is
begin
    if pi_timestring is not null then
        delete from THE_TABLE
        where
            THE_DATE_FIELD = to_date(pi_timestring, 'YYYYMMDDHH24MISS');
       
        EXIT WHEN SQL%ROWCOUNT = 0;
            
    end if;
    commit;
end purge_table;
amit_n_panchal,

 you code is pretty much just a copy of my procedure with some tweaks but you introduced an error when you did

 you need to either keep the loop structure intact or remove all of it.

also, putting the commit outside the if/end if doesn't make sense,  if you're not going to perform the delete, why commit?

Avatar of sdruss

ASKER

sdstuber:

Thanks.  Trying to learn quickly.  So you procedure would allow keeping (i.e. rentention) of "p_retenition" number of records? Why is the looping mechanism important?  What is needed to delete oldest, say 10,000 records, or records prior to a certain date?  Thanks!
a transaction will create undo records for every row it deletes.

if your transaction will delete many rows, you may create more undo than your database can hold.

if that happens you can loop and delete in pieces.  if you can't delete a million at a time, delete 100K at a time, if that's too big
change the constant to 10K at a time.  Adjust it until it works.

The ideal is if the loop never really loops, but instead enters the loop, runs once and completes.

if you want to delete prior to a specific date, your can either calculate what the rention would be
or change the parameter to be a date and change the where clause

 WHERE your_date < p_delete_prior

if you want to delete the oldest 10,000 rows, that's a bit trickier because you'll need to determine which rows those are first
so it's a double IO hit on the table, once to find the rows and sort them, then again to do the actual delete.
You would use ROWID for this method, but I don't really recommend it
missed the first question

>>  "p_retenition" number of records

no,  deletes all data prior to a retention number of days, not rows.


so,  if p_retention = 10 then all data 11 days or older will be deleted
sdstuber,

yes, following row needs to be removed
EXIT WHEN SQL%ROWCOUNT = 0;

apart from above commit will not make any difference whether is it outside/inside of if block.
In your code,
1. looping is not required
2. commit will be executed, even if no rows updated.
 
by the way, sdruss needs to delete rows based on timestamp, that's what i've mentioned in the procedure.


1 - looping "might be" required.  We don't know how many rows will be deleted or if it will fit in undo/rollback limits

if the limit is large, which I recommend, then the loop is inconsequential, it'll loop once, delete and be done

if the limit is small, which I don't recommend, but "might be" necessary, then the loop will handle it

so, the loop works either way

2 - yes, that's not recommended practice - why commit if you're not doing anything?

yes, I know by timestamp, I made the guess, but should have been more explicit that the timestamp would be a data retention limit.   In other words, I tried to code to what I thought the underlying requirement might be rather than the actual words.
But, again, I agree I should have highlighted that fact.

I wasn't disagreeing with your approach,   I agree a non-loop delete is best, but, as I've noted in posts and within the code itself as a comment,  IF (IF IF IF IF) you can't, then the loop will solve it.  If you can, then the loop doesn't hurt.  because the second time through it'll simply do nothing and exit

So, we're in agreement,  I wasn't saying your approach was wrong, rather it was essentially the same code I already posted, except the small changes had accidentally introduced a few issues.






Avatar of sdruss

ASKER

sdstuber:

So if I executed this:   exec purge_table(10);

This would delete all data 11-days and older?  Also, I need more resolution probably down to the hour and minute, versus day.  So that is why I want to remove data base on timestamp.  How would I pursue this? Expert amit_n_panchal passes in a string, but then truncates the string.   So, again I need finer resolution.  Thx sdruss
ASKER CERTIFIED SOLUTION
Avatar of Sean Stuber
Sean Stuber

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sdruss

ASKER

sdstuber:

Thanks!  Next issue.  I am having problem executing my stored procedure passing in the date value.  No matter what format I get this error:  ORA-01847 (day of month must be between 1 and last day of month).  Sorry for all of the elementary questions - I am extremelly new to stored procs.  This is how I execute my call on the command line:

    exec PurgeIt( to_date('06-APR-11 06.02.31', 'DD-MON-YY HH.MI.SS') );

I have tried multiple combinatins of the above.  Any ideas?  Thanks sdruss

first,  I recommend using 4 digit years and YYYY format mask,
no need to reinvent the Y2k bug!  :)

second HH  should be HH24, or you'll need to specify an am/pm field

last,  what does purgeit  look like?

is your in parameter a DATE type like   http:#35338508   ?

or is it a string?  If it is a date, are you doing another conversion inside the procedure? if so,  don't!
Avatar of sdruss

ASKER

sdstuber:

Got Stored Procedure that accepts a string as IN parameter to work.  However, I have overloaded this procedure and also need a version that accepts a date value as input.  

I changed the date mask:  changed YY  to YYYY
                                           changed HH  to HH24

My PurgeIt is your sample PurgeTable ( http:#35338508  ).  I changed the version of the procedure that accepts a TIMESTAMP rather than a DATE.  My column is a TIMESTAMP, so should this make things easier comparing TIMESTAMP to TIMESTAMP?  Thx sdruss
if you're using my function, then don't use strings at all, there are no conversions
and you don't need to overload for dates,  timestamp and date will implicitly convert between each other with the only difference being the loss of fractional seconds
Avatar of sdruss

ASKER


sdstuber:

With the version of the procedure that accepts an integer, and delete all data prior to a retention number of days.  This may be might favorite.  How would you change to modify for retention number of hours versus days or minutes ( ID:35330116) ?  Thx sdruss
just divide the retention by the number of whatever units of measurement are in a day

DAYS  - 1
HOURS - 24
MINUTES 1440


CREATE OR REPLACE PROCEDURE purge_table(p_retention   IN PLS_INTEGER,
                                        p_units       IN VARCHAR2 DEFAULT 'DAYS'
                                       )
IS
    -- set this to as large a number as your rollback/undo will support
    v_delete_limit   PLS_INTEGER := 100000;
BEGIN
    IF p_retention > 0 AND UPPER(p_units) IN ('DAYS', 'HOURS', 'MINUTES')
    THEN
        LOOP
            DELETE FROM your_table
                  WHERE your_date <
                            TRUNC(SYSDATE)
                            - (p_retention
                               / DECODE(UPPER(p_units),  'DAYS', 1,  'HOURS', 24,  'MINUTES', 1440))
                        AND ROWNUM <= v_delete_limit;

            EXIT WHEN SQL%ROWCOUNT = 0;

            COMMIT;
        END LOOP;

        COMMIT;
    END IF;
END purge_table;

Open in new window

Avatar of sdruss

ASKER

sdstuber:

Thanks with the previous example is it necessary to truncate SYSDATE ( trunc(SYSDATE) ).  I should be comparing my table column which is a timestamp.  So why would it be necessary to truncate.  Thx.
it's not strictly necessary,  the point of trunc is people frequently think of purging based on whole days, not point-in-time


for example,  5 days ago from 00:00:00  today,  not 5 days ago from 14:23:57.834943 today

if you want the former, use trunc, if you want the latter, don't
Avatar of sdruss

ASKER

sdstuber:

My granularity will probably be in hours.  For example, 12 hours ago from now.  So it looks like I can drop the truncate - you agree?  Also, do I need some type of exception code?  Thx
granularity in hours, yes, you'll probably want to remove the trunc

exception code - your choice, I prefer my exceptions to be propagated up in most cases,
 but, if it would be more appropriate to capture errors and log them or do some other processing then do that.
Avatar of sdruss

ASKER

sdstuber:

I am testing my stored procedure, and don't believe I'm there yet.  Some clarification use this code:

         DELETE FROM your_table
                  WHERE your_date <
                            (SYSDATE)  - (p_retention  /
                                 DECODE(UPPER(p_units),  'DAYS', 1,  'HOURS', 24,  'MINUTES', 1440))
   Delete all dta except the last 6 hours:  PurgeIt(6, 'Hours');       ?
   Delete all data except the last minute:  PurgeIt(2, 'MINUTES');   ?
   Delete all data except the last 5 days:  PurgeIt(5, "Days')  ?

Did I misunderstand?  Thx.
Yes your explanations for those inputs looks correct
If that's not what you're seeing,  what is it doing or not doing?
And, with what data?
Avatar of sdruss

ASKER

sdstuber:

My testing does has not reflected my intentions.  Given the premise to "Delete all data except the last 2 minutes" -  my test failed.

  I populated my table with 2.5 hours of test data with timestamp intervals of 2-minutes.  I executed my stored procedure as follow and all of my data was deleted - thought the last (2) minutes would be preserved:

       exec PurgeIt( 2, 'MINUTES');
I don't know what to tell you,  it works for me.

I used the exact code I posted above in http:#35351375

except I change the TRUNC(SYSDATE)  to SYSDATE as discussed in the posts following that one

Here is my complete test
SQL> CREATE TABLE your_table (your_date TIMESTAMP);

Table created.

SQL> INSERT INTO your_table
  2      SELECT TRUNC(SYSDATE) + LEVEL / 1440
  3        FROM DUAL
  4      CONNECT BY TRUNC(SYSDATE) + LEVEL / 1440 <= SYSDATE;

1343 rows created.

SQL> COMMIT;

Commit complete.

SQL> SELECT COUNT(*) FROM your_table;

  COUNT(*)
----------
      1343

SQL> EXEC purge_table(2,'Minutes');

PL/SQL procedure successfully completed.

SQL> SELECT * FROM your_table;

YOUR_DATE
---------------------------------------------------------------------------
11-APR-11 10.22.00.000000 PM
11-APR-11 10.23.00.000000 PM

SQL>

Open in new window

Avatar of sdruss

ASKER

sdstuber:

Will test tomorrow.  This is representative of my data:

    10   1001   ABC   1001    20110412171128
    20   1002   XYZ    1001    20110412171328
    30   1003   ABC   1001    20110412171528
    40   1004   XYZ   1001    20110412171728
    50   1005   ABC   1001    20110412171928
    60   1006   XYZ   1001    20110412172128
    70   1007   ABC   1001    20110412123128
are you trying to preserve the last N-units from the last entry?

or are you trying to preserve the last N-units from "now"?

If the former then in the procedure before the loop

 select max(your date) into some_variable from your table

then use the variable where it currently uses sysdate

Remember to declare the variable

If you want the latter, the code works as shown above
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sdruss

ASKER

Please close and award all points to sdstuber!
glad I could help.

to close questions immediately, don't accept your own post (even if you assign 0), just accept the ones that helped
Avatar of sdruss

ASKER

sdstuber you are absolutely the best.  You are extremelly patient as this is appreciated.  Thanks again!