How to return the first n rows and stop processing query (oracle)

The following query is being used in conjunction with a document management system that is organized with hierarchical folders which contains documents.  Each document contains metadata, of which has a 'revision' attribute.  The goal of the query is to find documents that do not currently have the revision attribute appended to the end of the filename ('_R<<revision_number>>')...

Example.  if I have the following two documents..

Document Name | Revision Number
document1234_R01 | 01
documentabcd          | 8A

Then I want the second document, 'documentabcd' to be returned.  The following is what I came up with, and it works on a small scale, but when I start the query at a higher level, it takes entirely way to long to run and I end up canceling the query.  I would prefer to only receive the first 100 or 1000 rows since I will be renaming the files with a different tool and could run batches of 1000, for example.

select distinct D.DATAID, 
A.VERNUM, 
d.name, 
A.VALSTR 
AS REVISION
from dtree d
left join llattrdata a on a.id=d.dataid
where D.SUBTYPE=144
and A.ATTRID=7
/**get version number for the current version**/
and A.VERNUM=(select max(A2.VERNUM) from llattrdata a2 where A2.ID=D.DATAID and A2.ATTRID=7) 
/**if the contents to the right of last underscore does not match current revision**/
and substr(d.name, instr(d.name, '_', -1)+2) != A.VALSTR
start with D.DATAId=10224461
connect by prior D.DATAID=D.PARENTID

Open in new window


Any ideas to improve this query's performance and/or to have it stop processing after it hits 1000 rows?  


Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE      11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
bmsandeAsked:
Who is Participating?
 
Mark GeerlingsDatabase AdministratorCommented:
"Any ideas to improve this query's performance ..."

Two big obvious ones:
1. Avoid outer (or: left) joins - these can have a serious performance penalty in Oracle
2. Avoid using the "distinct" keyword.  Whenever you use that keyword in an Oracle query (unless the columns you select are from a single table and exactly match an index) you force Oracle to read all of the rows, then do a sort operation to decide which rows are duplicates, before it can return any to you.

I don't have time right now to think through how you might be able to avoid those two things in your query.

If you want to return a fixed number of rows that is smaller than the number that matches your "where" clause conditions, Oracle supports using "rownum" to limit the rows returned.  However, this has a serious limitation: you cannot use this in combination with an "order by" or "group by" clause, at least not at the same level in the query.  You can get around this though by using a nested (or multi-level) query, also known as an "in-line view" something like this:

select * from
  (select [the columns you want]
  from [your tables]
  where [your join and filter conditions are true]
  [optional "group by" or "order by" clauses
    or your "start with" and "connect by" clauses])
where rownum < [your limit];

Note the closing parenthesis at the end of the line before the last "where".

Be aware though that Oracle will still take the time to read and sort all of the rows that match the "where" clause conditions, then it will return only the number of rows up to what you specify with [your limit].
0
 
awking00Commented:
I assume the VALSTR attribute represents the Revision Number. Is it always two characters?
0
 
bmsandeAuthor Commented:
Yes, valstr is the revision number--not always 2 digits, could be more
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
Wasim Akram ShaikCommented:
how about considering the usage of  first_rows hint.. i think its still valid with CBO, not sure though.. you can give it a try..

https://docs.oracle.com/cd/E11882_01/server.112/e41084/sql_elements006.htm#SQLRF50302

select /*+ first_rows(n) */

this will give you first n rows of the query output and will stop fetching the result after first_rows are fetched..
0
 
bmsandeAuthor Commented:
Thanks for the pointers, I will see if I can implement those ideas.
0
 
bmsandeAuthor Commented:
Ok, I think I found the main culprit for the duplicate rows and the need for 'distinct'...  I moved 'A.ATTRID=7' from the where clause to the join.  This alone improved the query to where it would run within a few seconds.  I can now use rownum to limit the results and speed up the results even more.  Thanks for the advice guys.  

select D.DATAID, 
A.VERNUM, 
d.name, 
A.VALSTR AS REVISION
from dtree d
left join llattrdata a on a.id=d.dataid and A.ATTRID=7
where D.SUBTYPE=144
/**get version number for the current version**/
and A.VERNUM=(select max(A2.VERNUM) from llattrdata a2 where A2.ID=D.DATAID and A2.ATTRID=7) 
/**if the contents to the right of last underscore does not match current revision**/
and substr(d.name, instr(d.name, '_', -1)+2) != A.VALSTR
--and rownum < 101
start with D.DATAId=4730562
connect by prior D.DATAID=D.PARENTID

Open in new window

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.