<

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x

Subquery Factoring WITH Oracle

Published on
30,503 Points
23,103 Views
4 Endorsements
Last Modified:
Approved
Community Pick
WITH Clause OR Subquery Factoring WITH Oracle

by Ivo Stoykov


Digging information from a database sometimes requires creating complicated queries. Often a simple select * from . . . is nested into another one and both into another one and so forth until the result becomes a very complicated query containing scalar queries (subqueries in the select clause) and in-line views (subqueries in the from clause). Such queries are difficult to read, hard to maintain, and a nightmare to optimize.

Fortunately there is a WITH clause defined in SQL-99 and implemented in Oracle9 release 2 and later. In Oracle database the WITH clause is used for materializing subqueries to avoid recomputing them multiple times without using temporary tables. The WITH clause allows factor out a sub-query, name it, and then reference it by name multiple times within the original complex query.

Additionally this technique lets the optimizer choose how to deal with the sub-query results -- whether to create a temporary table or inline it as a view.


The Syntax

WITH
 alias_name         -- Alias to use in the main query
AS
 (insert a query here)
SELECT...            -- Beginning of the query main body
Multiple aliases can be defined in the WITH clause:
WITH
 alias_name1
AS
 (query1)
 aleas_name2
AS
 (query2)
...
SELECT...

Open in new window

Let's look inside by taking a simple query:

SQL> col dummy format a10;
SQL> select dummy from dual;
 
DUMMY
----------
X
SQL>

Open in new window

Same re-written as WITH clause will be:

SQL> with d as (select * from dual)
  2  select * from d;
 
DUMMY
----------
X
SQL>

Open in new window

Even if it seems nonsense at first glance, using WITH clause in complicated queries could be very worthy.

Let’s consider usage of WITH clause as creating a temporary table called “d” and then selecting from it. (Oracle actually does not create table but merge SQL before execution.)  In other words WITH clause allows us to name a predefined Select statement in the context of a bigger Select and referenced in later by the given name.

Consider next few samples (Please scroll when necessary):

SQL> with temp_t1 as ( select dummy c1 from dual 
  2  select * from temp_t1 a,temp_t1 b
  3  /
 
C1 C1
-- --
X  X

SQL> with temp_t1 as (select dummy c1 from dual)
  2    ,temp_t2 as (select dummy c1 from dual)
  3  select * from temp_t1 a,temp_t2 b
  4  /
 
C1 C1
-- --
X  X

SQL> with temp_t1 as ( select dummy c1 from dual )
  2      ,temp_t2 as ( select dummy c1 from dual a,temp_t1 b where b.c1 = a.dummy )
  3  select * from temp_t2 a
  4  /
 
C1
--
X

SQL> select * from temp_t2 a;
 
select * from temp_t2 a
 
ORA-00942: table or view does not exist

Open in new window

So nevertheless we can select the recordset as a table it exists actually only during the execution.


What Can We Do WITH

Samples above shows that we can do following WITH:
1.      Reference a named query is allowed any number of times.
2.      Any number of named queries are allowed.
3.      Named queries can reference other named queries that came before them and even correlate to previous named queries.
4.      Named queries has a local scope to the SELECT in which they are defined.


Benefits

The obvious is the ability to create reusable construction inside a SELECT, name it and reuse it whenever necessary.


Referencing the Same Sub-query Multiple Times

The following query joins two tables and computes the aggregate SUM(SAL) more than once. The bold text represents the parts of the query that are repeated.

SELECT dname, SUM(sal) AS dept_total FROM emp, dept
 WHERE emp.deptno = dept.deptno
 GROUP BY dname HAVING
       SUM(sal) > ( SELECT SUM(sal) * 1/3 FROM emp, dept
                     WHERE emp.deptno = dept.deptno)
 ORDER BY SUM(sal) DESC;

Open in new window


You can improve the query by doing the subquery once, and referencing it at the appropriate points in the main query. The bold text represents the common parts of the subquery, and the places where the subquery is referenced.

WITH summary AS
(
 SELECT dname, SUM(sal) AS dept_total FROM emp, dept
 WHERE emp.deptno = dept.deptno
 GROUP BY dname
)
SELECT dname, dept_total FROM summary 
 WHERE dept_total > (SELECT SUM(dept_total) * 1/3 FROM summary)
 ORDER BY dept_total DESC;

Open in new window



What Stats Say

When a question comes to optimization using WITH clause, the answer is – depends. This is because many queries will be treated by Oracle as an in-line view. Sometimes recordset will be “materialized” into “on-the-fly” temporary table for use in later parts of the only this SQL statement. (This means that following execution of the same cursor will result in a new temp table each time.)

Let’s see an example. First we’ll create a test table with more than 1 million rows:

CREATE TABLE TestTbl
NOLOGGING
AS 
SELECT A1.OWNER       AS usr
      ,A1.OBJECT_TYPE AS PROD
      ,A1.OBJECT_ID   AS ITEMS
  FROM ALL_OBJECTS A1, EMP A2, DEPT A3;

SQL> select count(*) as recs from testtbl;  
 
      RECS
----------
   3140928

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER,'TESTTBL');
 
PL/SQL procedure successfully completed
 
SQL>

Open in new window


Next we’ll use a standard approach to show up some reports.

set serveroutput on;
set autotrace on;
set timing on;

SQL> SELECT prod
  2        ,total_items
  3    FROM  (
  4           SELECT prod
  5           ,      NVL(SUM(items),0) AS total_items
  6           FROM   testtbl
  7           GROUP  BY
  8                  prod
  9          ) ilv
 10    WHERE total_items > (SELECT SUM(items)/3 AS one_third_items
 11*                        FROM   testtbl)
SQL> /

PROD                TOTAL_ITEMS            
------------------- ---------------------- 
SYNONYM             50068277560            
JAVA CLASS          43610562520            


Elapsed: 00:00:03.07

Execution Plan
----------------------------------------------------------
Plan hash value: 2449639557

-------------------------------------------------------------------------------
| Id  | Operation           | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |         |     2 |    28 |  3154   (8)| 00:00:38 |
|*  1 |  FILTER             |         |       |       |            |          |
|   2 |   HASH GROUP BY     |         |     2 |    28 |  3154   (8)| 00:00:38 |
|   3 |    TABLE ACCESS FULL| TESTTBL |  3140K|    41M|  2967   (2)| 00:00:36 |
|   4 |   SORT AGGREGATE    |         |     1 |     5 |            |          |
|   5 |    TABLE ACCESS FULL| TESTTBL |  3140K|    14M|  2967   (2)| 00:00:36 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(NVL(SUM("ITEMS"),0)> (SELECT SUM("ITEMS")/3 FROM
              "TESTTBL" "TESTTBL"))


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      21319  consistent gets
      21213  physical reads
          0  redo size
        536  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

SQL>

Open in new window


As visible, resources are moderate and execution time is about 3 sec. The problem is that to get the result, table is accessed twice. To avoid this we could refactor the query using WITH clause. As was said above there will be a single access due to materialization of the recordset.

SQL> WITH ITEMS_REP AS (
  2       SELECT PROD, NVL(SUM(ITEMS),0) AS TOTAL_ITMES
  3         FROM TESTTBL
  4        GROUP BY PROD)
  5    SELECT PROD, TOTAL_ITMES
  6      FROM ITEMS_REP
  7     WHERE TOTAL_ITMES > (SELECT SUM(TOTAL_ITMES)/3 AS ONE_THIRD_ITEMS
  8                            FROM   ITEMS_REP)
  9/
SQL>

PROD                TOTAL_ITMES
------------------- -----------
SYNONYM             50068277560            
JAVA CLASS          43610562520            

Elapsed: 00:00:02.15

Execution Plan
----------------------------------------------------------
Plan hash value: 2967744285

-------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                          |    22 |   528 |  3158   (8)| 00:00:38 |
|   1 |  TEMP TABLE TRANSFORMATION |                          |       |       |            |          |
|   2 |   LOAD AS SELECT           |                          |       |       |            |          |
|   3 |    HASH GROUP BY           |                          |    22 |   308 |  3154   (8)| 00:00:38 |
|   4 |     TABLE ACCESS FULL      | TESTTBL                  |  3140K|    41M|  2967   (2)| 00:00:36 |
|*  5 |   VIEW                     |                          |    22 |   528 |     2   (0)| 00:00:01 |
|   6 |    TABLE ACCESS FULL       | SYS_TEMP_0FD9D6633_E00F1 |    22 |   308 |     2   (0)| 00:00:01 |
|   7 |    SORT AGGREGATE          |                          |     1 |    13 |            |          |
|   8 |     VIEW                   |                          |    22 |   286 |     2   (0)| 00:00:01 |
|   9 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6633_E00F1 |    22 |   308 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("TOTAL_ITMES"> (SELECT SUM("TOTAL_ITMES")/3 FROM  (SELECT /*+ CACHE_TEMP_TABLE
              ("T1") */ "C0" "PROD","C1" "TOTAL_ITMES" FROM "SYS"."SYS_TEMP_0FD9D6633_E00F1" "T1")
              "ITEMS_REP"))


Statistics
----------------------------------------------------------
          2  recursive calls
          9  db block gets
      10672  consistent gets
      10613  physical reads
        644  redo size
        536  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed

Open in new window


Now the query goes faster. Time is reduced by 1 sec. We have hard-parse in addition to the cost of setting up a temp table. From statistics we can see that the physical reads are significantly reduced.

SQL> EXPLAIN PLAN SET STATEMENT_ID = 'SQ_FCT' for
  2    WITH ITEMS_REP AS (
  3      SELECT PROD, NVL(SUM(ITEMS),0) AS TOTAL_ITMES
  4        FROM TESTTBL
  5       GROUP BY PROD)
  6    SELECT PROD, TOTAL_ITMES
  7      FROM ITEMS_REP
  8     WHERE TOTAL_ITMES > (SELECT SUM(TOTAL_ITMES)/3 AS ONE_THIRD_ITEMS
  9                            FROM   ITEMS_REP);
SQL>
SQL> @?RDBMS\ADMIN\utlxplp.sql

PLAN_TABLE_OUTPUT                                                                                                                                                                                                                                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 
Plan hash value: 395517267                                                                                                                                                                                                                                                                                   
                                                                                                                                                                                                                                                                                                             
-------------------------------------------------------------------------------------------------------                                                                                                                                                                                                      
| Id  | Operation                  | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                                                                                                                      
-------------------------------------------------------------------------------------------------------                                                                                                                                                                                                      
|   0 | SELECT STATEMENT           |                          |    22 |   528 |  3158   (8)| 00:00:38 |                                                                                                                                                                                                      
|   1 |  TEMP TABLE TRANSFORMATION |                          |       |       |            |          |                                                                                                                                                                                                      
|   2 |   LOAD AS SELECT           |                          |       |       |            |          |                                                                                                                                                                                                      
|   3 |    HASH GROUP BY           |                          |    22 |   308 |  3154   (8)| 00:00:38 |                                                                                                                                                                                                      
|   4 |     TABLE ACCESS FULL      | TESTTBL                  |  3140K|    41M|  2967   (2)| 00:00:36 |                                                                                                                                                                                                      
|*  5 |   VIEW                     |                          |    22 |   528 |     2   (0)| 00:00:01 |                                                                                                                                                                                                      
|   6 |    TABLE ACCESS FULL       | SYS_TEMP_0FD9D6637_E00F1 |    22 |   308 |     2   (0)| 00:00:01 |                                                                                                                                                                                                      
|   7 |    SORT AGGREGATE          |                          |     1 |    13 |            |          |                                                                                                                                                                                                      
|   8 |     VIEW                   |                          |    22 |   286 |     2   (0)| 00:00:01 |                                                                                                                                                                                                      
|   9 |      TABLE ACCESS FULL     | SYS_TEMP_0FD9D6637_E00F1 |    22 |   308 |     2   (0)| 00:00:01 |                                                                                                                                                                                                      
-------------------------------------------------------------------------------------------------------

Open in new window


The execution plan above shows the new TEMP TABLE TRANSFORMATION that Oracle has introduced (Step 2). There is a global temporary table Oracle created and loaded from the first scan of TESTTBL. This temporary dataset is then used to answer the overall question of which product items are more than one-third (Step 7).

Let’s repeat that Oracle will not always materialise subqueries in this way. Sometimes it will either merge the subquery into the main query or treat it as a simple in-line view.  In cases when CBO chooses not to materialise subquery, but we do want to, Oracle supports the MATERIALIZE hint.

Without hint we see:

SQL> set autotrace traceonly explain
SQL>
SQL> WITH INLINE_VIEW AS (
  2       SELECT PROD
  3       ,      SUM(ITEMS) AS TOTAL_ITEMS
  4       FROM   testtbl
  5    GROUP BY PROD
  6       )
  7    SELECT *
  8    FROM   inline_view;

Execution Plan
----------------------------------------------------------
Plan hash value: 1165785812

------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         |    22 |   308 |  3154   (8)| 00:00:38 |
|   1 |  HASH GROUP BY     |         |    22 |   308 |  3154   (8)| 00:00:38 |
|   2 |   TABLE ACCESS FULL| TESTTBL |  3140K|    41M|  2967   (2)| 00:00:36 |
------------------------------------------------------------------------------

SQL>

Open in new window


Plan shows us that Oracle has chosen not to materialise this subquery because it is used once and temporary table would be a waste of resources.
Adding the MATERIALIZE hint as follows shows following:

SQL> WITH INLINE_VIEW AS (
  2         SELECT /*+ materialize */
  3                PROD, SUM(ITEMS) AS TOTAL_ITEMS
  4         FROM   testtbl
  5      GROUP BY PROD
  6         )
  7      SELECT *
  8      FROM   inline_view;

Execution Plan
----------------------------------------------------------
Plan hash value: 844976587

-------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                          |    22 |   528 |  3156   (8)| 00:00:38 |
|   1 |  TEMP TABLE TRANSFORMATION |                          |       |       |            |          |
|   2 |   LOAD AS SELECT           |                          |       |       |            |          |
|   3 |    HASH GROUP BY           |                          |    22 |   308 |  3154   (8)| 00:00:38 |
|   4 |     TABLE ACCESS FULL      | TESTTBL                  |  3140K|    41M|  2967   (2)| 00:00:36 | 
|   5 |   VIEW                     |                          |    22 |   528 |     2   (0)| 00:00:01 |
|   6 |    TABLE ACCESS FULL       | SYS_TEMP_0FD9D6639_E00F1 |    22 |   308 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

SQL>

Open in new window


The temporary table is present in the plan.

On large datasets, however, the savings from such optimisation can be quite. On tiny datasets, the time involved in the temporary table setup can take longer than the original query itself so is not a particularly useful mechanism.


Conclusion

As WITH clause is refactoring of a SELECT statement, it could be used anywhere where SELECT is acceptable, including in DML operations as INSERT, UPDATE and DELETE.

INSERT INTO table_name
WITH subquery_name AS (SELECT . . .)
SELECT ... FROM   subquery_name;

UPDATE table_name
SET    column_name = ( WITH subquery_name AS (SELECT ...)
                       SELECT ... FROM   subquery_name );

DELETE FROM table_name
WHERE  column_name IN ( WITH subquery_name AS (SELECT ...)
                        SELECT ... FROM   subquery_name );

Open in new window



Restrictions on Subquery Factoring:

•      WITH clause cannot be nested. That is, WITH cannot exists in another WITH. However, the name defined in WITH clause can be used in the subquery of any subsequent subquery.
•      In a query with set operators, the set operator subquery cannot contain the WITH clause, but the FROM subquery can.


Difference between 11g and previous versions

Named query must be used in the SQL statement prior to Oracle 11g. It must appear in another named query (like above) or in the main SELECT statement (like the first example). Otherwise an error is raised.

If we fire following script against version less than 11g

SELECT * FROM v$version;
WITH T1 AS (SELECT * FROM DUAL),
     T2 AS (SELECT * FROM DUAL)
select * from t1;

Open in new window


Result will be an error:

BANNER                                                           
---------------------------------------------------------------- 
Oracle9i Enterprise Edition Release 9.2.0.7.0 - Production       
PL/SQL Release 9.2.0.7.0 - Production                            
CORE	9.2.0.7.0	Production                                          
TNS for 32-bit Windows: Version 9.2.0.7.0 - Production           
NLSRTL Version 9.2.0.7.0 - Production                            


Error starting at line 2 in command:
WITH T1 AS (SELECT * FROM DUAL),
     T2 AS (SELECT * FROM DUAL)
select * from t1
Error at Command Line:4 Column:14
Error report:
SQL Error: ORA-32035: unreferenced query name defined in WITH clause
32035. 00000 -  "unreferenced query name defined in WITH clause"
*Cause:    There is at least one WITH clause query name that is not
           referenced in any place.
*Action:   remove the unreferenced query name and retry

Open in new window


In Oracle 11g this is no more a problem

SELECT * FROM v$version;
WITH T1 AS (SELECT * FROM DUAL),
     T2 AS (SELECT * FROM DUAL)
select * from t1;

Open in new window


Result is

BANNER                                                                           
--------------------------------------------------------------------------- 
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production           
PL/SQL Release 11.1.0.7.0 - Production                                           
CORE	11.1.0.7.0	Production                                                         
TNS for 32-bit Windows: Version 11.1.0.7.0 - Production                          
NLSRTL Version 11.1.0.7.0 - Production                                           

DUMMY 
----- 
X     

Open in new window



Additional reading

A good discussion about WITH clause at AskTom: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:4423923392083
4
Comment
Author:Ivo Stoykov
1 Comment
LVL 15

Expert Comment

by:Devinder Singh Virdi
Good Article, I agree everybody is using nested inline view instead of WITH. (Even I do sometime).

Suggestion: Can you include Flushing of Shared_Pool and buffer_cache in order to reflect more acurate result for Consistent and Physical IO also?
0

Featured Post

Exploring SharePoint 2016

Explore SharePoint 2016, the web-based, collaborative platform that integrates with Microsoft Office to provide intranets, secure document management, and collaboration so you can develop your online and offline capabilities.

Join & Write a Comment

This video shows syntax for various backup options while discussing how the different basic backup types work.  It explains how to take full backups, incremental level 0 backups, incremental level 1 backups in both differential and cumulative mode a…
This video shows how to recover a database from a user managed backup

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month