Link to home
Start Free TrialLog in
Avatar of ImranRashid
ImranRashid

asked on

Removing hierarchical data

Hi All,

I have a table with hierarchical data in it but there is some duplication that I want to remove with a single SQL statement. If a single SQL statement cannot be used, then the most efficient PL/SQL block can also be used.

There are two column in the table as shown below.

Prev_id         Next_id
A                  B
B                  C
C                  D
A                  C
A                  D

Prev_id is the parent of Next_id. All is well till the third line where a simple connect by query can be used but the last two records are the indirect relationships which should not be there and I want to remove them. The table contains millions of rows and I want to use the most efficient method to remove the records. I have tried to use subqueries in DELETE statement but when I run it on the test data, it removes some rows which it should not. Any help will be greatly appreciated.
SOLUTION
Avatar of Sujith
Sujith
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of YANN0S
YANN0S

Hello,
I have one question to clarify the requirement: You want to delete all rows ('n', 'm') for which there is a path
'n', ..'k'..., 'm' (i.e. with more than one hop). Is that true?
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
tranz435, why plsql forall statement is much more efficient and fast than a regular DML?
The reason is regular delete one record at a time but if you use plsql forall... it will bulk collect all the records to be deleted and then delete it.  You can google around for the benefits of bulk collect in plsql.

I am not saying that your query is wrong... just that if you are deleting more records then the regular DML statements wont work efficiently.  
ImranRashid

drop all the indexes on all columns except for the primary key if you want your delete to run faster
Are you sure? I know (and I agree) that using bulk collect and bulk bind to perform DML within PL/SQL is much more efficient than using normal PL/SQL cursors and DML statements. But compared to plain SQL statements? I've never seen that. I believe that in most cases a single SQL statement is more efficient than any PL/SQL block.

Can you provide some links that suggest that it is better to use PL/SQL with bulk bind instead of SQL?
>tranz435. BTW, in your code it is better to use ROWID, instead of the primary key column.
YANN0S
Ya I agree that ROWID would be a better to use.  

>>Can you provide some links that suggest that it is better to use PL/SQL with bulk bind instead of SQL?
for your questions, there are many websites which can show you that PLSQL delete would be much better than regular delete statement.  One such website is :

http://www.dbasupport.com/oracle/ora9i/DeleteHistoricalData02.shtml

It shows the performance of using each statements like regular delete, using for loop, using forall and then using execute immediate.

Look for any problem in oracle there are multiple solutions and unless you implement your solution you wont know whether it is efficient or not because all environments will be different.  Hence I would leave the judgment upto the person who asked this question to decide which one he will go for....

I know that looking at the website link any one can write an execute_immediate statement and give it to ImranRashid and say that its the best statement for delete.
I agree that for any problem there are multiple solutions. Well, actually for any problem there is a single best solution, but although two problems may be similar they are never the same.

When it comes to deleting rows, for example, it depends on how many rows you want to delete, if you have indexes on the table, rollback segment configuration etc.
for example, if you are deleting "alot of the rows" you might be better off disabling indexes, doing
the delete and rebuilding them.


OR, creating a new table that just keeps the "right records" and dropping the old table

Anyway, the link you provided, describes some tests performed in a specific environment. I still remain cautious about its conlcusions, since it does not provide any explanation nor any reference for them.
For example one of the conclusion is:
"do not delete all the data in one shot ; break data into chunks and commit in between"
but the fastest method deletes all data in one chunk.

In addition it states:
"The best result was using EXECUTE_IMMEDIATE calls inside an anonymous PL/SQL block, during test case 5. This is not surprising, since Oracle suggests this method for mass data deletion"

I don't know where the author found the Oracle suggestion, what I found in oracle documentation is:

"Native dynamic SQL in PL/SQL performs comparably to the performance of static SQL, because the PL/SQL interpreter has built-in support for it"

http://download-uk.oracle.com/docs/cd/B10501_01/appdev.920/a96590/adg09dyn.htm

Anyway, I believe that if they executed the test cases in a reverse order, they would get tottaly different results.
thanks for your information YANN0S
Avatar of ImranRashid

ASKER

Thank you all for your interest in this question and I got some queries to experiment with.

As far as the question of "most optimized" delete method, I would go with YANN0S. Nothing can be faster than a single dml statement and a PL/SQL block would always be slower than SQL statements. The rule of thumb for any dml is to never commit before your transactions are actually finished. I don't know how the author of the article mentioned above came to the idea that committing in between is a good idea, in fact the opposite is true. Anyone who has studied Thomas Kyte's Expert One on One, the best book available for Oracle developers, would know that it is a very bad idea to commit in between the transactions. It's the same Thomas from Ask Tom section of Oracle's website.

I'm a bit rusty with queries right now because I'm coming back to use it after quite some time but I can still recall some "old" things <smile>