Link to home
Start Free TrialLog in
Avatar of Eddie Shipman
Eddie ShipmanFlag for United States of America

asked on

Oracle query tuning help required

I have a document archive that is somewhat of a tuple table where we store keywords that pertain to the document in one table and the path to the document in another. This setup was created almost 15 yrs ago and I'd rather not make any monumental changes to it.

The lookups are extremely slow as you can see by the query and the table definitions below. I have added a numeric fd_cli_rid value on the filedirectory table which should make lookups much faster. So, if I switch to go through the filedirectory table to get the records required, how would I rewrite this query to return all rows?

The FILEDIRECTORY table is primarily used for versioning of files. It can poossiby point to a different path than the FILENAME keyword in FILEKEYPAIR.
select /*+INDEX(filekeypair filekeypair2)*/ FD_RID, FKP_KEYWORD, FKP_VALUE, FD_FILENAME, FD_LASTMODIFIED from filekeypair,( 
select distinct p.*, rownum rnum from (
select /*+INDEX(filedirectory filedirattrib_fd_rid)*/ FD_LASTMODIFIED,FD_RID,FD_FILENAME
from filedirectory,(select * from filekeypair where FKP_KEYWORD='CLI_RID') z 
where z.FKP_FD_RID(+)=FD_RID
      and FD_RID in (select /*+INDEX(B FILEKEYPAIR3)*/ b.FKP_FD_RID from filekeypair b
                     where (upper(b.FKP_KEYWORD) like upper('CLI_RID')||'%'
                        and upper(b.FKP_VALUE) like upper('1026697')||'%')
                       and (upper(b.FKP_KEYWORD) like upper('DISTRIBUTION')||'%'
                        and upper(b.FKP_VALUE) not like upper('INTERNAL')||'%'))
order by z.FKP_VALUE ASC ,FD_LASTMODIFIED  ASC
) p where rownum<=100000
) where rnum>=1 and FKP_FD_RID=FD_RID order by rnum

Open in new window

Definitions:
CREATE TABLE FILEKEYPAIR
(
  FKP_FD_RID   NUMBER,
  FKP_KEYWORD  VARCHAR2(100 BYTE),
  FKP_VALUE    VARCHAR2(110 BYTE),
  FKP_INSUSER  VARCHAR2(100 BYTE),
  FKP_INSDATE  DATE,
  FKP_DELUSER  VARCHAR2(100 BYTE),
  FKP_DELDATE  DATE
)

CREATE TABLE FILEDIRECTORY
(
  FD_RID           NUMBER(9),
  FD_FILENAME      VARCHAR2(1000 BYTE),
  FD_LASTMODIFIED  DATE,
  FD_EVE_RID       NUMBER(9),
  FD_DELETED       DATE,
  FD_OLD           CHAR(1 BYTE),
  FD_CLI_RID       NUMBER
)

Open in new window

This would be a typical record retrieved using the above query:
FD_RID     FKP_KEYWORD  FKP_VALUE                    FD_FILENAME                 FD_LASTMODIFIED
---------- ------------ ---------------------------- --------------------------- -------------------
5741874    CLI_RID      1026697                      \NEWCCDESIGMEMO20150103.PDF 1/5/2015 7:19:16 AM
5741874    DESIG        CC                           \NEWCCDESIGMEMO20150103.PDF 1/5/2015 7:19:16 AM
5741874    PRINT_DATE   20150105                     \NEWCCDESIGMEMO20150103.PDF 1/5/2015 7:19:16 AM
5741874    FILENAME     \NewCCDesigMemo20150103.pdf  \NEWCCDESIGMEMO20150103.PDF 1/5/2015 7:19:16 AM
5741874    DOC_NAME     New CC Desig Memo            \NEWCCDESIGMEMO20150103.PDF 1/5/2015 7:19:16 AM

Open in new window

It eventually comes out displayed like this in the UI:
DOC_NAME           FILENAME                    PRINT_DATE  DESIG   CLI_RID
------------------ --------------------------- ----------- ------- --------
New CC Desig Memo  \NewCCDesigMemo20150103.pdf 20150105    CC      1026697

Open in new window

Avatar of Mark Geerlings
Mark Geerlings
Flag of United States of America image

Where to start?  This query contains example of many of the things to avoid in SQL if you want good performance from Oracle.
1. The use of operators, like: "upper(b.FKP_KEYWORD)" on database columns in "where" clauses should always be avoided, since these prevent Oracle from using indexes on that column (unless you have function-based indexes that exactly match this syntax).
2. The nested queries (select ... from (select...(from select...))) can create performance problems, depending on the numbers of records returned by the inner queries and on the amount of memory in your server, since these intermediate result sets need to be buffered somewhere and evaluated without the benefit of indexes.
3. Outer joins, for example: "where z.FKP_FD_RID(+)=FD_RID", can add significant performance penalties.
4. Filters on rownum *ONLY* reduce the number of rows returned by the query, They do *NOT* reduce the number of rows that Oracle has to process internally before returning the requested number of rows to the application.

Basically, it looks to me like whoever designed this query (or this system?) was concerned about flexibility but either had no concern for, or was unaware of, the impact on performance when Oracle queries are written like this.
Avatar of Eddie Shipman

ASKER

Ok, do you have any suggestions on how to re-write it other than just dumping on the original author?
First: is the data in columns: FKP_KEYWORD and FKP_VALUE in table: filekeypair actually stored in mixed-case, or lower-case values, or inconsistently, or is it actually stored in all upper-case values already?  If this data is actually all upper-case, then removing the "upper" conversions on these may help greatly.  If this data is not stored consistently, then you may need to create function-based indexes on these columns that use the "upper" operator.  If you don't have permission to add indexes, and the data is stored inconsistently, then it is difficult to query it efficiently.

Second: we have no idea how many rows are returned by your nested queries, and we have no idea how much RAM your server has and what your SGA size is.  Without some more information from you on these topics, we don't know if we should change these or not.

Third: is the outer join actually needed?  That is, if you remove the outer join do you get the same number of records returned?

Fourth: can you provide filters to help reduce the rows returned to the number (or close to the number) that is currently being limited by use of rownum?
1. It is mixed-case, FKP_KEYWORD is supposed to be all uppercase but people inserting the records sometimes write keywords that aren't.
2. It depends on the user's archive that is queried. One user has over 14,000 document, which equals over 90,000 rows, most users are a lot less than that.
3. I don't really know, I didn't write the query and the PL/SQL developer we have here doesn't know who wrote it, either.
4. I'm not sure why it is written that way so I can't really tell you. I do know that we filter out all reacords where
FKP_KEYWORD='DISTRIBUTION' AND FKP_VALUE='INTERNAL'

Open in new window

SOLUTION
Avatar of Abhimanyu Suri
Abhimanyu Suri
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
This query has been in use for a number of years. It is called from a .Net web service. It is built dynamically from input parameters that allow the user to filter the documents to be returned.
The rnum parameter allows them to query by pages. The CLI_RID value filters them to their own documents and when we have other filters, like DOC_NAME, the where clause gets dynamically created just like the CLI_RID where clause.
Table definitions:
// filekypair is forced view of filekeypairattib
CREATE TABLE CIC3.FILEKEYPAIRATTIB
(
  FKP_FD_RID   NUMBER,
  FKP_KEYWORD  VARCHAR2(100 BYTE),
  FKP_VALUE    VARCHAR2(110 BYTE),
  FKP_INSUSER  VARCHAR2(100 BYTE),
  FKP_INSDATE  DATE,
  FKP_DELUSER  VARCHAR2(100 BYTE),
  FKP_DELDATE  DATE
)
// filedirectory is forced view of filedirattrib
CREATE TABLE CIC3.FILEDIRATTRIB
(
  FD_RID           NUMBER(9),
  FD_FILENAME      VARCHAR2(1000 BYTE),
  FD_LASTMODIFIED  DATE,
  FD_EVE_RID       NUMBER(9),
  FD_DELETED       DATE,
  FD_OLD           CHAR(1 BYTE)
)

Open in new window

Simplified query without the INTERNAL filter:
select /*+INDEX(filekeypair filekeypair2)*/ FD_RID, FKP_KEYWORD, FKP_VALUE, FD_FILENAME, FD_LASTMODIFIED from filekeypair,( 
select distinct p.*, rownum rnum from (
select /*+INDEX(filedirectory filedirattrib_fd_rid)*/ FD_LASTMODIFIED,FD_RID,FD_FILENAME
from filedirectory,(select * from filekeypair where FKP_KEYWORD='CLI_RID') z 
where z.FKP_FD_RID(+)=FD_RID
      and FD_RID in (select /*+INDEX(B FILEKEYPAIR3)*/ b.FKP_FD_RID from filekeypair b
                     where (upper(b.FKP_KEYWORD) like upper('CLI_RID')||'%'
                        and upper(b.FKP_VALUE) like upper('1026697')||'%')
                       )
order by z.FKP_VALUE ASC ,FD_LASTMODIFIED  ASC
) p where rownum<=100000
) where rnum>=1 and FKP_FD_RID=FD_RID order by rnum desc

Open in new window


Plan:
SELECT STATEMENT  ALL_ROWSCost: 18  Bytes: 3,378  Cardinality: 6  												
	17 SORT ORDER BY  Cost: 18  Bytes: 3,378  Cardinality: 6  											
		16 TABLE ACCESS BY INDEX ROWID TABLE CIC3.FILEKEYPAIRATTIB Cost: 4  Bytes: 198  Cardinality: 6  										
			15 NESTED LOOPS  Cost: 17  Bytes: 3,378  Cardinality: 6  									
				13 VIEW CIC3. Cost: 13  Bytes: 530  Cardinality: 1  								
					12 HASH UNIQUE  Cost: 13  Bytes: 77  Cardinality: 1  							
						11 COUNT STOPKEY  						
							10 VIEW CIC3. Cost: 12  Bytes: 77  Cardinality: 1  					
								9 SORT ORDER BY STOPKEY  Cost: 12  Bytes: 143  Cardinality: 1  				
									8 NESTED LOOPS OUTER  Cost: 11  Bytes: 143  Cardinality: 1  			
										5 NESTED LOOPS  Cost: 7  Bytes: 110  Cardinality: 1  		
											2 SORT UNIQUE  Cost: 4  Bytes: 33  Cardinality: 1  	
												1 INDEX RANGE SCAN INDEX CIC3.FILEKEYPAIR3 Cost: 4  Bytes: 33  Cardinality: 1  
											4 TABLE ACCESS BY INDEX ROWID TABLE CIC3.FILEDIRATTRIB Cost: 2  Bytes: 77  Cardinality: 1  	
												3 INDEX UNIQUE SCAN INDEX (UNIQUE) CIC3.FILEDIRATTRIB_FD_RID Cost: 1  Cardinality: 1  
										7 TABLE ACCESS BY INDEX ROWID TABLE CIC3.FILEKEYPAIRATTIB Cost: 4  Bytes: 33  Cardinality: 1  		
											6 INDEX RANGE SCAN INDEX CIC3.FILEKEYPAIR2 Cost: 3  Cardinality: 1  	
				14 INDEX RANGE SCAN INDEX CIC3.FILEKEYPAIR2 Cost: 3  Cardinality: 6  								

Open in new window

INDEXES:
CREATE INDEX CIC3.FILEKEYPAIR2 ON CIC3.FILEKEYPAIRATTIB
(FKP_FD_RID, FKP_KEYWORD, FKP_INSDATE)

CREATE UNIQUE INDEX CIC3.FILEDIRATTRIB_FD_RID ON CIC3.FILEDIRATTRIB
(FD_RID)

CREATE INDEX CIC3.FILEKEYPAIR3 ON CIC3.FILEKEYPAIRATTIB
(UPPER("FKP_VALUE"), UPPER("FKP_KEYWORD"), FKP_FD_RID)

Open in new window

You know, I think I'm going to rethink this. On a test DB, I restructured filedirattrib to have a FD_CLI_RID column and populated it with the FKP_VALUE for the FKP_KEYWORD 'CLI_RID' for records with the same FKP_FD_RID as FD_RID. and it made the new query much faster when going through filedirattrib.

This would be the plan:
Plan
SELECT STATEMENT  ALL_ROWSCost: 248  Bytes: 40,817  Cardinality: 343  				
	5 TABLE ACCESS BY INDEX ROWID TABLE CIC3.FILEKEYPAIRATTIB Cost: 3  Bytes: 216  Cardinality: 6  			
		4 NESTED LOOPS  Cost: 248  Bytes: 40,817  Cardinality: 343  		
			2 TABLE ACCESS BY INDEX ROWID TABLE CIC3.FILEDIRATTRIB Cost: 63  Bytes: 5,146  Cardinality: 62  	
				1 INDEX RANGE SCAN INDEX CIC3.FILEDIRATTRIB_CLI_RID Cost: 3  Cardinality: 62  
			3 INDEX RANGE SCAN INDEX CIC3.FILEKEYPAIR2 Cost: 2  Cardinality: 6  	

Open in new window

and the query:
select d.FD_RID, k.FKP_KEYWORD, k.FKP_VALUE, d.FD_FILENAME, d.FD_LASTMODIFIED from filekeypair k 
inner join FILEDIRATTRIB d on k.fkp_fd_rid = d.FD_RID
where d.fd_cli_rid = 1026697 

Open in new window

The only thing is, I am not sure how to build my filters. On to another question...
"This query has been in use for a number of years."
OK, but that doesn't mean the application was well-designed for the current use, or for the current data volumes, or for running on an Oracle database.

"It is called from a .Net web service."
That may explain the "SQL Server-like" syntax that I see in this query.

"It is built dynamically from input parameters that allow the user to filter the documents to be returned."
Ouch!  Oracle is optimized for static SQL statements that use bind variables, rather than "hard-coded" literal values in the "where" clauses.  With static SQL statements, Oracle can parse them only once, then re-use them over and over with different bind variable values, each time.  With dynamically-generated SQL statement that include "hard-coded" literal values in the "where" clauses, Oracle has to parse each one every time, to answer questions like:
1. Are the table names valid?
2. Are the column names valid?
3. Does the logged-in user have permission to query the tables?
4. Are there indexes available on these tables on columns that match the columns referenced in the "where" clauses, to help Oracle avoid reading all records in the tables?
5. Does the database have up-to-date statistics on the tables and indexes that indicate: how many rows they contain; how many blocks are these rows scattered across; how many distinct values do the columns and indexes contain; are the rows physically ordered similar to an index or not, etc.?
These parsing actions take time.  Applications that force Oracle to "hard parse" every SQL statement will be slower than applications that allow Oracle to use its strengths.

"The rnum parameter allows them to query by pages."
OK, that may make sense to an application developer, but that is not what Oracle is optimized for.