Link to home
Start Free TrialLog in
Avatar of spinbains
spinbains

asked on

For a purging process, I'm trying to create a procedure that deletes and commits in a loop. There are over 7 million rows in the table.

For a purging process, I'm trying to create a procedure that deletes and commits in a loop. There are over 7 million rows in the table. I've used some helpful tips in Experts Exchange but I'm getting the ORA-1001 invalid cursor.
 
I'm running oracle 9.2.0.5.

I want to delete all the rows in the table dstracking.webactions_emarch that have a dateinserted <='13-Nov-2006'.
 
Here is my procedure:
 
PROCEDURE            DELETE_COMMIT_LOOP
IS
  comm_after number :=500;
  rnum       number :=0;
  c1         varchar2(15);
  CURSOR cur1 IS select trunc(b.dateinserted)  
                    from dstracking.webactions_emarch b
                    where trunc(b.dateinserted) <= '13-Nov-2006';
   
BEGIN
  OPEN cur1;
  LOOP
    FETCH cur1 INTO c1;
    EXIT WHEN cur1%notfound;
    rnum :=rnum +1;
    DELETE FROM dstracking.webactions_emarch b
           WHERE EXISTS
                 (SELECT * FROM dstracking.webactions_emarch b
                           where trunc(b.dateinserted)=c1);
           IF rnum = comm_after then
           COMMIT;
           rnum :=0;
           END IF;
  END LOOP;
  CLOSE cur1;
  COMMIT;
dbms_output.put_line('Rows '||to_char(rnum)||' deleted');
  exception
  when NO_DATA_FOUND
  then
  dbms_output.put_line('ERROR at'||to_char(rnum));
  when others
  then dbms_output.put_line('ERROR at'||to_char(rnum));
END;
Avatar of sfs_dba
sfs_dba

You fetch a date into a varchar (c1         varchar2(15);)
Try to make it a date...
check the datatype for dateinserted, it is varchar2 or date type?  i ask because you are using the 'trunc' funtion on that column.  if it is a date type you must convert your c1 var to a date or convert your dateinserted value into a varchar2 prior to comparing or updating the values.


good luck,
daniels
PROCEDURE DELETE_COMMIT_LOOP
IS
  rnum       number :=0;
  c1         date;
  CURSOR cur1 IS select rownum, trunc(b.dateinserted)  
                    from dstracking.webactions_emarch b
                    where trunc(b.dateinserted) <= to_date('13-Nov-2006');
   
BEGIN
  OPEN cur1;
  LOOP
    FETCH cur1 INTO rnum, c1;
    EXIT WHEN cur1%notfound;
    rnum :=rnum +1;
    DELETE FROM dstracking.webactions_emarch b
           WHERE EXISTS
                 (SELECT * FROM dstracking.webactions_emarch b
                           where trunc(b.dateinserted)=c1);
           IF rnum mod 500 = 0 then
                 COMMIT;
           END IF;
  END LOOP;
  CLOSE cur1;
  COMMIT;
 
dbms_output.put_line('Rows '||to_char(rnum)||' deleted');
  exception
  when NO_DATA_FOUND
  then
  dbms_output.put_line('ERROR at'||to_char(rnum));
  when others
  then dbms_output.put_line('ERROR at'||to_char(rnum));
 
END;
ASKER CERTIFIED SOLUTION
Avatar of rbrooker
rbrooker
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
the last sentence should read :

this also reduces the procedure down to one tablescan ( to get the initial cursor ).  the original one has one tablescan to get the initial cursor and two tablescans for each record ( delete from blah where not exists blah will result in two tablescans ).
Might also be worth using bulk binding, as this is supposed to improve performance by reducing the number of context switches, eg

PROCEDURE delete_commit_loop IS
  rnum NUMBER := 0;
  type rTabType is table of urowid;
  rTab rTabType;

  CURSOR cur1 IS
    SELECT ROWID
    FROM dstracking.webactions_emarch b
    WHERE trunc(b.dateinserted) <= '13-Nov-2006';

BEGIN
  OPEN cur1;
  LOOP
    FETCH cur1 bulk collect INTO rTab limit 20000;
    EXIT WHEN not cur1.exists(1);

    forall i in rTab.first .. rTab.last
      DELETE FROM dstracking.webactions_emarch b
      WHERE ROWID = rTab(i);

    rnum := rnum + sql%rowcount;
    commit;
    rTab.delete;
  END LOOP;
  CLOSE cur1;
    dbms_output.put_line('Rows ' || to_char(rnum) || ' deleted');
EXCEPTION
    WHEN OTHERS THEN
        dbms_output.put_line('ERROR after ' || to_char(rnum));
END;
Avatar of Mark Geerlings
Your cursor prevents the use of an index (which foirces a full-table scan of this large table).  Are you sure that you want that?  Is the table indexed on: dateinserted?  If yes, your cursor should look like this instead:

CURSOR cur1 IS select trunc(b.dateinserted)  
                    from dstracking.webactions_emarch b
                    where b.dateinserted < '14-Nov-2006';

That will only work though, if the datatype of column "dateinserted" is "date", and if your NLS_DATE_FORMAT is: "DD-MON-YYYY".  If you are sure that this column is a "date" but aren't sure about your NLS_DATE_FORMAT, this would be safer:

CURSOR cur1 IS select trunc(b.dateinserted)  
                    from dstracking.webactions_emarch b
                    where b.dateinserted < to_date('14-Nov-2006','DD-MON-YYYY');

If this column is indexed (and is a "date" column), the delete may go much faster if the variable "c1" is declared as a "date" and the actual delete statement looks like this:
DELETE FROM dstracking.webactions_emarch b
           WHERE b.dateinserted < c1;
Avatar of spinbains

ASKER

Wow! You are all ROCK OUT LOUD! I'm going to test these suggestions and yes the cursor will be created on a date field. Rbrooker, you were starting to add something but it didn't come through. I'd like to know you other thoughts.
my thoughts were : delete by rowid, much faster than by usign a where not exists.
reasons :
with rowid, the delete statement does not do a tablescan, it accesses the record directly.  this reduces the procedure down to one tablescan ( outer cursor - cur1 ).
using "delete blah wher not exists ( select blah from blah )" uses 2 tablescans, one on the outer delete, and one for the inner subquery.  this results in two tablescan for each date returned by the outer query.  LOTS more work for the database.  

The outer cursor:
    CURSOR cur1 IS
        SELECT ROWID
        FROM dstracking.webactions_emarch b
        WHERE trunc(b.dateinserted) <= '13-Nov-2006';

can also use any indexes you may have on the webactions_emarch table by using hints :
    CURSOR cur1 IS
        SELECT /*+ INDEX( b, name_of_index ) */ ROWID
        FROM dstracking.webactions_emarch b
        WHERE trunc(b.dateinserted) <= '13-Nov-2006';

which will speed up the outer cursor IF the index specified is on the date_inserted column.

thats it, just some random ramblings :)
What is the volume of data after '13-Nov-2006'?

If it is lesser:
Instead of deleting the data, create another table as "select * from <original table> where <dt col> > '13-Nov-2006'
(remember!! you are creating another table with the required data).

Now you can verify that all data that is required is there in this new table.
- rename the old table for backup or drop the old table.
- rename the new table to the actual table name
- enable constraints
- Build indexes

This will be the fastest way to get this done.

Now, if this is a periodic activity that you have to do every month:
Then, consider range partitioning your table on the date column so that instead of deleting data, you can use:

ALTER TABLE <table name> TRUNCATE PARTITION <old partition name> for purging the unwanted data.

This will save a lot of time and undo/redo.
Those are good suggestions from sujith80!  Just be careful if you do use this apporach to make sure that you copy all: constraints, indexes, grants and triggers from the original table to the new table, then re-compile any invalid PL\SQL objects that reference this table.

Also, if you can use partitioning, that can greatly speed up maintenance like this in the future, since the "alter table ... drop partition..." command is very fast with very low overhead.  Your indexes then should be locally-partitoned matching the table partitions, or they will become invalid, and you will have to rebuild them after dropping a table partition.
Why would you want to execute this procedure in a loop? Why not just -
create or replace procedure delete_commit is
begin
delete from dstracking.webactions_emarch where dateinserted <='13-Nov-2006';
commit;
end;
/

To awking00:

I had thought of that too, but guessed that that was tried and failed with a "rollback segment too small" error.  If so, another option would be a simple list of SQL delete commands with a commit between each one, like this:

delete from dstracking.webactions_emarch where dateinserted < to_date('01-Jan-2005');
commit;
delete from dstracking.webactions_emarch where dateinserted < to_date('01-Feb-2005');
commit;
delete from dstracking.webactions_emarch where dateinserted < to_date('01-Mar-2005');
commit;
etc.

To spinbains:

If you want to use this approach you will have to determine the earliest date to use in the first delete statement.  I just guessed at Jan. of 2005, but if your table has data for 2004 or earlier, you will need to use and earlier date.  Also, depending on the number of records per month and the size of your rollback (or undo) segments, you may not be able to do a whole month in one step, or you may be able to do multiple months in one step.
I'm still testing the other scripts and will send results shortly.

To markgeer, awking00:

Yes I tried something similar. Since I'm on 9.2.0.5 I'm not using rollback segments but UNDO so the error reported insufficient UNDO space.
markgeer,
I didn't think of that and you are probably right. Perhaps the two ideas could be combined. The following procedure deletes from the earliest dateinserted value (converted to the 13th of the month) plus a month, commits, then adds a month and repeats until it reaches the target date.

create or replace procedure delete_commit_loop is
v_month   date;

begin

select add_months(min(trunc(dateinserted,'MON') + 12), 1)
into v_month
from dstracking.webactions_emarch;

while v_month <= to_date('13-Nov-06','dd-Mon-yy') loop
 delete from dstracking.webactions_emarch where dateinserted < v_month;
 commit;
 v_month := add_months(v_month,1);
end loop;

end;
/
each if these solutions requires many tablescans.  tablescans take time.  if you do a tablescan that gathers the rowids of all the records to be deleted, there is one tablescan and a whole heap of "acess by rowid" operations that are very very fast.

partitioning is a great idea.  if you partitioned by month, each month prior to the 13th of november would be a truncate / drop of a table partition.  for the month in question, it would be a delete where trunc( dateinserted ) is less than 13th december, and hopefully the number of records would be low enough so that your undo is sufficient to be able to do this in a single statement.

also with partitioning, you can create archive tables and exchange them with the table partition to allow you to export them and store them offline.  this would allow you to retrieve the data at a later date if required.
To all: To be more specific the dstracking.webactions_emarch table is 7million rows, there would be 2.8million rows deleted using the search criteria and to those of you that asked, yes there was an index on the date insertdate column. This particular corporate organization does not currently use PARTITION'ing in it's Oracle environment. Let's see I think the last issue I need to reply to is there are file system limitations/(tablespaces) also in this environment so creating the other table - which would also be great for reducing fragmentaion and highwater marks - is physical limitation for me as an Oracle DBA.

Also, I'm using Quest SQLNavigator to build, compile and test the procedure.

To drs66: I tried your suggestion the Procedure ran for 30 some minutes and then errored with "Error at2"

To rbrooker: I tried your suggestion utilizing the rowid and the 2.8million rows where deleted in less than 12 minutes!
rock on :)
To ishando: I tried your suggestion but rec'd a error message on the cur1.exists(1)  PLS-00225 subprogram or cursor 'CUR1' is out of scope.
oops - that should have been "EXIT WHEN not rTab.exists(1);"