We help IT Professionals succeed at work.

Removing hierarchical data

ImranRashid
ImranRashid asked
on
1,480 Views
Last Modified: 2013-12-18
Hi All,

I have a table with hierarchical data in it but there is some duplication that I want to remove with a single SQL statement. If a single SQL statement cannot be used, then the most efficient PL/SQL block can also be used.

There are two column in the table as shown below.

Prev_id         Next_id
A                  B
B                  C
C                  D
A                  C
A                  D

Prev_id is the parent of Next_id. All is well till the third line where a simple connect by query can be used but the last two records are the indirect relationships which should not be there and I want to remove them. The table contains millions of rows and I want to use the most efficient method to remove the records. I have tried to use subqueries in DELETE statement but when I run it on the test data, it removes some rows which it should not. Any help will be greatly appreciated.
Comment
Watch Question

SujithData Architect
CERTIFIED EXPERT
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Commented:
Hello,
I have one question to clarify the requirement: You want to delete all rows ('n', 'm') for which there is a path
'n', ..'k'..., 'm' (i.e. with more than one hop). Is that true?
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Commented:
tranz435, why plsql forall statement is much more efficient and fast than a regular DML?

Commented:
The reason is regular delete one record at a time but if you use plsql forall... it will bulk collect all the records to be deleted and then delete it.  You can google around for the benefits of bulk collect in plsql.

I am not saying that your query is wrong... just that if you are deleting more records then the regular DML statements wont work efficiently.  

Commented:
ImranRashid

drop all the indexes on all columns except for the primary key if you want your delete to run faster

Commented:
Are you sure? I know (and I agree) that using bulk collect and bulk bind to perform DML within PL/SQL is much more efficient than using normal PL/SQL cursors and DML statements. But compared to plain SQL statements? I've never seen that. I believe that in most cases a single SQL statement is more efficient than any PL/SQL block.

Can you provide some links that suggest that it is better to use PL/SQL with bulk bind instead of SQL?

Commented:
>tranz435. BTW, in your code it is better to use ROWID, instead of the primary key column.

Commented:
YANN0S
Ya I agree that ROWID would be a better to use.  

>>Can you provide some links that suggest that it is better to use PL/SQL with bulk bind instead of SQL?
for your questions, there are many websites which can show you that PLSQL delete would be much better than regular delete statement.  One such website is :

http://www.dbasupport.com/oracle/ora9i/DeleteHistoricalData02.shtml

It shows the performance of using each statements like regular delete, using for loop, using forall and then using execute immediate.

Look for any problem in oracle there are multiple solutions and unless you implement your solution you wont know whether it is efficient or not because all environments will be different.  Hence I would leave the judgment upto the person who asked this question to decide which one he will go for....

I know that looking at the website link any one can write an execute_immediate statement and give it to ImranRashid and say that its the best statement for delete.

Commented:
I agree that for any problem there are multiple solutions. Well, actually for any problem there is a single best solution, but although two problems may be similar they are never the same.

When it comes to deleting rows, for example, it depends on how many rows you want to delete, if you have indexes on the table, rollback segment configuration etc.
for example, if you are deleting "alot of the rows" you might be better off disabling indexes, doing
the delete and rebuilding them.


OR, creating a new table that just keeps the "right records" and dropping the old table

Anyway, the link you provided, describes some tests performed in a specific environment. I still remain cautious about its conlcusions, since it does not provide any explanation nor any reference for them.
For example one of the conclusion is:
"do not delete all the data in one shot ; break data into chunks and commit in between"
but the fastest method deletes all data in one chunk.

In addition it states:
"The best result was using EXECUTE_IMMEDIATE calls inside an anonymous PL/SQL block, during test case 5. This is not surprising, since Oracle suggests this method for mass data deletion"

I don't know where the author found the Oracle suggestion, what I found in oracle documentation is:

"Native dynamic SQL in PL/SQL performs comparably to the performance of static SQL, because the PL/SQL interpreter has built-in support for it"

http://download-uk.oracle.com/docs/cd/B10501_01/appdev.920/a96590/adg09dyn.htm

Anyway, I believe that if they executed the test cases in a reverse order, they would get tottaly different results.

Commented:
thanks for your information YANN0S

Author

Commented:
Thank you all for your interest in this question and I got some queries to experiment with.

As far as the question of "most optimized" delete method, I would go with YANN0S. Nothing can be faster than a single dml statement and a PL/SQL block would always be slower than SQL statements. The rule of thumb for any dml is to never commit before your transactions are actually finished. I don't know how the author of the article mentioned above came to the idea that committing in between is a good idea, in fact the opposite is true. Anyone who has studied Thomas Kyte's Expert One on One, the best book available for Oracle developers, would know that it is a very bad idea to commit in between the transactions. It's the same Thomas from Ask Tom section of Oracle's website.

I'm a bit rusty with queries right now because I'm coming back to use it after quite some time but I can still recall some "old" things <smile>
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.