The following query is being used in conjunction with a document management system that is organized with hierarchical folders which contains documents. Each document contains metadata, of which has a 'revision' attribute. The goal of the query is to find documents that do not currently have the revision attribute appended to the end of the filename ('_R<<revision_number>>').
Example. if I have the following two documents..
Document Name | Revision Number
document1234_R01 | 01
documentabcd | 8A
Then I want the second document, 'documentabcd' to be returned. The following is what I came up with, and it works on a small scale, but when I start the query at a higher level, it takes entirely way to long to run and I end up canceling the query. I would prefer to only receive the first 100 or 1000 rows since I will be renaming the files with a different tool and could run batches of 1000, for example.
select distinct D.DATAID,
from dtree d
left join llattrdata a on a.id=d.dataid
/**get version number for the current version**/
and A.VERNUM=(select max(A2.VERNUM) from llattrdata a2 where A2.ID=D.DATAID and A2.ATTRID=7)
/**if the contents to the right of last underscore does not match current revision**/
and substr(d.name, instr(d.name, '_', -1)+2) != A.VALSTR
start with D.DATAId=10224461
connect by prior D.DATAID=D.PARENTID
Any ideas to improve this query's performance and/or to have it stop processing after it hits 1000 rows?
Oracle Database 11g Enterprise Edition Release 220.127.116.11.0 - 64bit Production
PL/SQL Release 18.104.22.168.0 - Production
CORE 22.214.171.124.0 Production
TNS for Linux: Version 126.96.36.199.0 - Production
NLSRTL Version 188.8.131.52.0 - Production