WITH Clause OR Subquery Factoring WITH Oracle
by Ivo Stoykov
Digging information from a database sometimes requires creating complicated queries. Often a simple select * from . . . is nested into another one and both into another one and so forth until the result becomes a very complicated query containing scalar queries (subqueries in the select clause) and in-line views (subqueries in the from clause). Such queries are difficult to read, hard to maintain, and a nightmare to optimize.
Fortunately there is a WITH clause defined in SQL-99 and implemented in Oracle9 release 2 and later. In Oracle database the WITH clause is used for materializing subqueries to avoid recomputing them multiple times without using temporary tables. The WITH clause allows factor out a sub-query, name it, and then reference it by name multiple times within the original complex query.
Additionally this technique lets the optimizer choose how to deal with the sub-query results -- whether to create a temporary table or inline it as a view.
The Syntax
WITH
alias_name -- Alias to use in the main query
AS
(insert a query here)
SELECT... -- Beginning of the query main body
Multiple aliases can be defined in the WITH clause:
WITH
alias_name1
AS
(query1)
aleas_name2
AS
(query2)
...
SELECT...
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
Select allOpen in new window
Let's look inside by taking a simple query:
Same re-written as WITH clause will be:
Even if it seems nonsense at first glance, using WITH clause in complicated queries could be very worthy.
Let’s consider usage of WITH clause as creating a temporary table called “d” and then selecting from it. (Oracle actually does not create table but merge SQL before execution.) In other words WITH clause allows us to name a predefined Select statement in the context of a bigger Select and referenced in later by the given name.
Consider next few samples (Please scroll when necessary):
SQL> with temp_t1 as ( select dummy c1 from dual
2 select * from temp_t1 a,temp_t1 b
3 /
C1 C1
-- --
X X
SQL> with temp_t1 as (select dummy c1 from dual)
2 ,temp_t2 as (select dummy c1 from dual)
3 select * from temp_t1 a,temp_t2 b
4 /
C1 C1
-- --
X X
SQL> with temp_t1 as ( select dummy c1 from dual )
2 ,temp_t2 as ( select dummy c1 from dual a,temp_t1 b where b.c1 = a.dummy )
3 select * from temp_t2 a
4 /
C1
--
X
SQL> select * from temp_t2 a;
select * from temp_t2 a
ORA-00942: table or view does not exist
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
Select allOpen in new window
So nevertheless we can select the recordset as a table it exists actually only during the execution.
What Can We Do WITH
Samples above shows that we can do following WITH:
1. Reference a named query is allowed any number of times.
2. Any number of named queries are allowed.
3. Named queries can reference other named queries that came before them and even correlate to previous named queries.
4. Named queries has a local scope to the SELECT in which they are defined.
Benefits
The obvious is the ability to create reusable construction inside a SELECT, name it and reuse it whenever necessary.
Referencing the Same Sub-query Multiple Times
The following query joins two tables and computes the aggregate SUM(SAL) more than once. The bold text represents the parts of the query that are repeated.
SELECT dname, SUM(sal) AS dept_total FROM emp, dept
WHERE emp.deptno = dept.deptno
GROUP BY dname HAVING
SUM(sal) > ( SELECT SUM(sal) * 1/3 FROM emp, dept
WHERE emp.deptno = dept.deptno)
ORDER BY SUM(sal) DESC;
1:
2:
3:
4:
5:
6:
Select allOpen in new window
You can improve the query by doing the subquery once, and referencing it at the appropriate points in the main query. The bold text represents the common parts of the subquery, and the places where the subquery is referenced.
WITH summary AS
(
SELECT dname, SUM(sal) AS dept_total FROM emp, dept
WHERE emp.deptno = dept.deptno
GROUP BY dname
)
SELECT dname, dept_total FROM summary
WHERE dept_total > (SELECT SUM(dept_total) * 1/3 FROM summary)
ORDER BY dept_total DESC;
1:
2:
3:
4:
5:
6:
7:
8:
9:
Select allOpen in new window
What Stats Say
When a question comes to optimization using WITH clause, the answer is – depends. This is because many queries will be treated by Oracle as an in-line view. Sometimes recordset will be “materialized” into “on-the-fly” temporary table for use in later parts of the only this SQL statement. (This means that following execution of the same cursor will result in a new temp table each time.)
Let’s see an example. First we’ll create a test table with more than 1 million rows:
CREATE TABLE TestTbl
NOLOGGING
AS
SELECT A1.OWNER AS usr
,A1.OBJECT_TYPE AS PROD
,A1.OBJECT_ID AS ITEMS
FROM ALL_OBJECTS A1, EMP A2, DEPT A3;
SQL> select count(*) as recs from testtbl;
RECS
----------
3140928
SQL> exec DBMS_STATS.GATHER_TABLE_STATS(USER,'TESTTBL');
PL/SQL procedure successfully completed
SQL>
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
Select allOpen in new window
Next we’ll use a standard approach to show up some reports.
set serveroutput on;
set autotrace on;
set timing on;
SQL> SELECT prod
2 ,total_items
3 FROM (
4 SELECT prod
5 , NVL(SUM(items),0) AS total_items
6 FROM testtbl
7 GROUP BY
8 prod
9 ) ilv
10 WHERE total_items > (SELECT SUM(items)/3 AS one_third_items
11* FROM testtbl)
SQL> /
PROD TOTAL_ITEMS
------------------- ----------------------
SYNONYM 50068277560
JAVA CLASS 43610562520
Elapsed: 00:00:03.07
Execution Plan
----------------------------------------------------------
Plan hash value: 2449639557
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 28 | 3154 (8)| 00:00:38 |
|* 1 | FILTER | | | | | |
| 2 | HASH GROUP BY | | 2 | 28 | 3154 (8)| 00:00:38 |
| 3 | TABLE ACCESS FULL| TESTTBL | 3140K| 41M| 2967 (2)| 00:00:36 |
| 4 | SORT AGGREGATE | | 1 | 5 | | |
| 5 | TABLE ACCESS FULL| TESTTBL | 3140K| 14M| 2967 (2)| 00:00:36 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(NVL(SUM("ITEMS"),0)> (SELECT SUM("ITEMS")/3 FROM
"TESTTBL" "TESTTBL"))
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
21319 consistent gets
21213 physical reads
0 redo size
536 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
SQL>
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
Select allOpen in new window
As visible, resources are moderate and execution time is about 3 sec. The problem is that to get the result, table is accessed twice. To avoid this we could refactor the query using WITH clause. As was said above there will be a single access due to materialization of the recordset.
SQL> WITH ITEMS_REP AS (
2 SELECT PROD, NVL(SUM(ITEMS),0) AS TOTAL_ITMES
3 FROM TESTTBL
4 GROUP BY PROD)
5 SELECT PROD, TOTAL_ITMES
6 FROM ITEMS_REP
7 WHERE TOTAL_ITMES > (SELECT SUM(TOTAL_ITMES)/3 AS ONE_THIRD_ITEMS
8 FROM ITEMS_REP)
9/
SQL>
PROD TOTAL_ITMES
------------------- -----------
SYNONYM 50068277560
JAVA CLASS 43610562520
Elapsed: 00:00:02.15
Execution Plan
----------------------------------------------------------
Plan hash value: 2967744285
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 22 | 528 | 3158 (8)| 00:00:38 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | | | | | |
| 3 | HASH GROUP BY | | 22 | 308 | 3154 (8)| 00:00:38 |
| 4 | TABLE ACCESS FULL | TESTTBL | 3140K| 41M| 2967 (2)| 00:00:36 |
|* 5 | VIEW | | 22 | 528 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6633_E00F1 | 22 | 308 | 2 (0)| 00:00:01 |
| 7 | SORT AGGREGATE | | 1 | 13 | | |
| 8 | VIEW | | 22 | 286 | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6633_E00F1 | 22 | 308 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter("TOTAL_ITMES"> (SELECT SUM("TOTAL_ITMES")/3 FROM (SELECT /*+ CACHE_TEMP_TABLE
("T1") */ "C0" "PROD","C1" "TOTAL_ITMES" FROM "SYS"."SYS_TEMP_0FD9D6633_E00F1" "T1")
"ITEMS_REP"))
Statistics
----------------------------------------------------------
2 recursive calls
9 db block gets
10672 consistent gets
10613 physical reads
644 redo size
536 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2 rows processed
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
Select allOpen in new window
Now the query goes faster. Time is reduced by 1 sec. We have hard-parse in addition to the cost of setting up a temp table. From statistics we can see that the physical reads are significantly reduced.
SQL> EXPLAIN PLAN SET STATEMENT_ID = 'SQ_FCT' for
2 WITH ITEMS_REP AS (
3 SELECT PROD, NVL(SUM(ITEMS),0) AS TOTAL_ITMES
4 FROM TESTTBL
5 GROUP BY PROD)
6 SELECT PROD, TOTAL_ITMES
7 FROM ITEMS_REP
8 WHERE TOTAL_ITMES > (SELECT SUM(TOTAL_ITMES)/3 AS ONE_THIRD_ITEMS
9 FROM ITEMS_REP);
SQL>
SQL> @?RDBMS\ADMIN\utlxplp.sql
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 395517267
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 22 | 528 | 3158 (8)| 00:00:38 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | | | | | |
| 3 | HASH GROUP BY | | 22 | 308 | 3154 (8)| 00:00:38 |
| 4 | TABLE ACCESS FULL | TESTTBL | 3140K| 41M| 2967 (2)| 00:00:36 |
|* 5 | VIEW | | 22 | 528 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6637_E00F1 | 22 | 308 | 2 (0)| 00:00:01 |
| 7 | SORT AGGREGATE | | 1 | 13 | | |
| 8 | VIEW | | 22 | 286 | 2 (0)| 00:00:01 |
| 9 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6637_E00F1 | 22 | 308 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
Select allOpen in new window
The execution plan above shows the new TEMP TABLE TRANSFORMATION that Oracle has introduced (Step 2). There is a global temporary table Oracle created and loaded from the first scan of TESTTBL. This temporary dataset is then used to answer the overall question of which product items are more than one-third (Step 7).
Let’s repeat that Oracle will not always materialise subqueries in this way. Sometimes it will either merge the subquery into the main query or treat it as a simple in-line view. In cases when CBO chooses not to materialise subquery, but we do want to, Oracle supports the MATERIALIZE hint.
Without hint we see:
SQL> set autotrace traceonly explain
SQL>
SQL> WITH INLINE_VIEW AS (
2 SELECT PROD
3 , SUM(ITEMS) AS TOTAL_ITEMS
4 FROM testtbl
5 GROUP BY PROD
6 )
7 SELECT *
8 FROM inline_view;
Execution Plan
----------------------------------------------------------
Plan hash value: 1165785812
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 22 | 308 | 3154 (8)| 00:00:38 |
| 1 | HASH GROUP BY | | 22 | 308 | 3154 (8)| 00:00:38 |
| 2 | TABLE ACCESS FULL| TESTTBL | 3140K| 41M| 2967 (2)| 00:00:36 |
------------------------------------------------------------------------------
SQL>
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
Select allOpen in new window
Plan shows us that Oracle has chosen not to materialise this subquery because it is used once and temporary table would be a waste of resources.
Adding the MATERIALIZE hint as follows shows following:
SQL> WITH INLINE_VIEW AS (
2 SELECT /*+ materialize */
3 PROD, SUM(ITEMS) AS TOTAL_ITEMS
4 FROM testtbl
5 GROUP BY PROD
6 )
7 SELECT *
8 FROM inline_view;
Execution Plan
----------------------------------------------------------
Plan hash value: 844976587
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 22 | 528 | 3156 (8)| 00:00:38 |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | | | | | |
| 3 | HASH GROUP BY | | 22 | 308 | 3154 (8)| 00:00:38 |
| 4 | TABLE ACCESS FULL | TESTTBL | 3140K| 41M| 2967 (2)| 00:00:36 |
| 5 | VIEW | | 22 | 528 | 2 (0)| 00:00:01 |
| 6 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6639_E00F1 | 22 | 308 | 2 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------
SQL>
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
Select allOpen in new window
The temporary table is present in the plan.
On large datasets, however, the savings from such optimisation can be quite. On tiny datasets, the time involved in the temporary table setup can take longer than the original query itself so is not a particularly useful mechanism.
Conclusion
As WITH clause is refactoring of a SELECT statement, it could be used anywhere where SELECT is acceptable, including in DML operations as INSERT, UPDATE and DELETE.
INSERT INTO table_name
WITH subquery_name AS (SELECT . . .)
SELECT ... FROM subquery_name;
UPDATE table_name
SET column_name = ( WITH subquery_name AS (SELECT ...)
SELECT ... FROM subquery_name );
DELETE FROM table_name
WHERE column_name IN ( WITH subquery_name AS (SELECT ...)
SELECT ... FROM subquery_name );
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
Select allOpen in new window
Restrictions on Subquery Factoring:
• WITH clause cannot be nested. That is, WITH cannot exists in another WITH. However, the name defined in WITH clause can be used in the subquery of any subsequent subquery.
• In a query with set operators, the set operator subquery cannot contain the WITH clause, but the FROM subquery can.
Difference between 11g and previous versions
Named query must be used in the SQL statement prior to Oracle 11g. It must appear in another named query (like above) or in the main SELECT statement (like the first example). Otherwise an error is raised.
If we fire following script against version less than 11g
SELECT * FROM v$version;
WITH T1 AS (SELECT * FROM DUAL),
T2 AS (SELECT * FROM DUAL)
select * from t1;
1:
2:
3:
4:
Select allOpen in new window
Result will be an error:
BANNER
----------------------------------------------------------------
Oracle9i Enterprise Edition Release 9.2.0.7.0 - Production
PL/SQL Release 9.2.0.7.0 - Production
CORE 9.2.0.7.0 Production
TNS for 32-bit Windows: Version 9.2.0.7.0 - Production
NLSRTL Version 9.2.0.7.0 - Production
Error starting at line 2 in command:
WITH T1 AS (SELECT * FROM DUAL),
T2 AS (SELECT * FROM DUAL)
select * from t1
Error at Command Line:4 Column:14
Error report:
SQL Error: ORA-32035: unreferenced query name defined in WITH clause
32035. 00000 - "unreferenced query name defined in WITH clause"
*Cause: There is at least one WITH clause query name that is not
referenced in any place.
*Action: remove the unreferenced query name and retry
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
Select allOpen in new window
In Oracle 11g this is no more a problem
SELECT * FROM v$version;
WITH T1 AS (SELECT * FROM DUAL),
T2 AS (SELECT * FROM DUAL)
select * from t1;
1:
2:
3:
4:
Select allOpen in new window
Result is
BANNER
---------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - Production
PL/SQL Release 11.1.0.7.0 - Production
CORE 11.1.0.7.0 Production
TNS for 32-bit Windows: Version 11.1.0.7.0 - Production
NLSRTL Version 11.1.0.7.0 - Production
DUMMY
-----
X
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
Select allOpen in new window
by: virdi_ds on 2010-04-28 at 13:12:59ID: 13926
Suggestion: Can you include Flushing of Shared_Pool and buffer_cache in order to reflect more acurate result for Consistent and Physical IO also?