Subqueries at a glance

Mike EghtebasDatabase and Application Developer
Published:
Updated:

Introduction: When a new description on subject like subqueries is read, with various types and each with unique features, often readers try to make a mental picture of the process. Such a mental picture as a point of reference becomes very helpful when one attempts to apply the techniques to solve a problem. This is why the old saying "One Picture is Worth a Thousand Words" holds true. 


I have prepared the content in this article with visual presentation followed by descriptions and samples to make the path much easier for those interested in having a good understanding of the world of subqueries and retention of techniques to apply them later. Your comment below and/or indicating if this article was helpful is very much appreciated and could be used to improve my future work.


You may download Northwind.bak to restore in your local SQL Server from https://northwinddatabase.codeplex.com to practice or test most of the examples included in this article.  Also, consider printing figure 1 below to keep it handy for reference as you are reading the article.


0x.pngIn T-SQL we can use the results returned from one query in another. The embedded or referencedsubquery (inner query) returns required data to enable the outer query to perform more complex operations compared to a simple select clause. Below is the list of different subqueries discussed later in detail:


1) Self-contained subqueries (embedded).

2) Correlated subqueries (embedded).

3) Derived Table Expressions (embedded).

4) Common Table Expression, CTE (referenced).

5) Views and Inline Table-Valued Functions (referenced).


To reference a Table Expression (or an encapsulated named query ) that is executed prior to its use at referenced points in an outer query and/ or in an intermediate table expression. The major advantage of using a named table expression is to facilitate recursive operation and to be referenced more than once in a query operation.


 

Figure01.png Figure 1: Subqueries at a glance.



* Inline table-valued FUNCTIONS return a table data type, the other types of functions not related to the discussion here are scalar FUNCTIONS and Built-in functions. Scalar functions return a single data value and Built-in functions come with SQL Server to facilitate various operations.

01x.pngA self-contained subquery embedded in an outer query is a stand-alone query with no dependencies to its outer query. You can highlight such subquery to test run it independently for QC purpose.

02x.pngA scalar subquery, by definition, returns a single value to be used in its outer query either as predicate in the outer query's WHERE clause or included directly in its outer query's SELECT clause as an aliased column. 


-- Problem statement: Show product names for ContactName ='Peter Wilson'.

SELECT ProductID
     , ProductName 
FROM dbo.Products
WHERE SupplierID = (SELECT SupplierID                -- scalar subquery in WHERE clause, example 1.1
                    FROM dbo.Suppliers
	            Where ContactName ='Peter Wilson');

-- Problem statement: Display all products with first contact name from Suppliers table. Not very
-- practicle, but it shows how to include a self-containe independent subquery in a SELECT clause.

SELECT ProductID
     , ProductName
     , (SELECT Top(1) ContactName                    -- scalar subquery in SELECT clause, example 1.2
        FROM dbo.Suppliers) AS FirstContactName 
FROM dbo.Products;


If a self-contained scalar subquery returns an empty set, its outer query sees it as a null mark. And, if such a subquery by mistake coded returning multiple values will produce the following error upon running the entire query (Outer + Subquery): 


Msg 512, Level 16, State 1, Line 23

Subquery returned more than 1 value...


Note: The new and acceptable terminolgy is null or null mark. The term null value, often used, is not correct because by definition null is neither a value nor can have a value.

03x.pngThe multiple value self-contained query is used where a multi-valued result is expected, for example when using the IN predicate. As an example, the following query uses a multi-valued subquery to return products supplied by suppliers from UK. The inner query returns supplierid for suppliers from UK. You can use ... WHERE supplierid NOT IN (SELECT supplierid... 0 to return all others but UK if required.


-- Problem statement: Show product names not associated with ContactName ='Peter Wilson'.

SELECT ProductID
     , ProductName
FROM dbo.Products 
WHERE SupplierID IN (SELECT SupplierID               -- Multi-value subquery in WHERE clause, example 1.3
                     FROM dbo.Suppliers
                     WHERE CompanyName <> 'Peter Wilson') 

-- There is no practical application for multi-value subquery in SELECT clause.

04x.pngCorrelated subqueries, unlike self-contained subqueries discussed above, this subquery references a column in the outer query. Thus we cannot select a correlated subquery to test run it. Both outer and inner query must run at once.

05x.pngSame table correlation: Both outer and inner query refer to different instances of the same table.


-- Problem statement:  Write a query to return products with the maximum unit price in each category.

SELECT ProductName
     , UnitPrice
	 , CategoryID
FROM dbo.Products AS t1                              -- Same table correlation subquery, example 2.1
WHERE unitprice =
(SELECT MAX(unitprice)
FROM dbo.Products AS t2
WHERE t2.categoryid = t1.categoryid);

To produce the expected results, the same table (dbo.Products) is aliased as t1 and t2 to produce two instance of the same table to be used independent of each other in the outer and inner query .

06x.pngDifferent table correlation: The outer query (Customers) and the inner query (Orders) refer to instances of two different tables.


-- Problem statement: Write a query to return CompanyName for OrderDate '1997-01-16'.

SELECT CompanyName                                   
FROM dbo.Customers AS C                              -- Different tables correlation subquery, example 2.2
WHERE EXISTS    -- exist here returns true or false 
(SELECT * FROM dbo.Orders AS O
 WHERE O.CustomerID = C.CustomerID 
       AND O.OrderDate = '1997-01-16');

-- ... WHERE NOT EXISTS(SELECT * FROM …    <-- use this when required

08x.pngTable Expressions are named queries. You a) write an inner query that returns a relational result set, b) name it, and c) query it from an outer query. T-SQL supports four forms of table expressions:

a) Derived tables and b) Common table expressions (CTEs) are created and used in code (non-db object).

c) Views and d) Inline table-valued functions are created in code or in SSMS stored as db objects.

09x.pngNon-database-objects are created and used in code, visible only in the scope of the statement that defines them. When out of scope, the table expression is gone and is not usable. See the first example under section 3 below.

10x.pngThe derived table is a subquery in a FROM clause. The FROM clause must have an aliased name. It is called a derived table because the fields in the outer SELECT clause query come from this named derived table. There are Inline and External derived table subqueries:

11x.pngIn an inline single derived table, it is mandatory to give an alias to the table in the FROM clause.

In an external single derived table, it is mandatory to list the field name inside () after the FROM table alias name. For T-SQL to build temp table #t see http://www.experts-exchange.com/questions/28688390/How-to-query-using-derived-table-and-multi-CTEs.html


Product  ProdType                                    -- Inline single derived table subquery, example 3.1
A	      x
B	      x
B	      xx
C	      x
D	      x
D	      xx
E	      x 
-- Problem statement: Starting with above data, produce the following (ignore single Prod. Types):
ProdType  ProdType
B	      x
B	      xx
D	      x
D	      xx

Select Product, ProdType
from (
  select *, count(*) over (partition by Product) cnt from #t
) d where cnt > 1;

-- Note: This example will be used later in discussing CTEs.

07ax.pngThe self imposed limitation with derived tables is that we don't use multiple table expressions in a FROM clause whereas with CTEs discussed later we can. Of course, technically speaking, we could nest multiple derived tables as shown below but nesting them in this manner complicates the coding and makes it prone to syntax errors thus rarely used. 


-- Multiple derived tables syntax 
-- (for reasons stated this is possible but not used):

SELECT f1, f2, ...                       
FROM (SELECT ...  
      FROM (SELECT ...  
            FROM Ta 
            WHERE ...) AS Tb
      WHERE ...) AS Tc
WHERE ...) As Td (f1, f2, ...);

12x.pngCTE (common table expression):

CTE is like derived table expressions. It's a named table expression that is visible only to the statement that follows it. Like a query against a derived table, CTE involves three main parts:

a) Name the inner query,

b) Name the columns of the inner query (right outside the FROM clause),

c) Name the outer query.


As opposed to derived tables, CTEs are not nested. CTE_1 and CTE_2 etc. are separated by a comma. Now, CTE_2 can reference CTE_1 because it leads VTE_2. And the outer query can reference both CTE_1 and CTE_2.

;WITH <CTE_a>                    -- Single CTE syntax
AS
(
<inner query>
)
<outer query>
;                 -- Semicolon at the end terminates CTE_a operation.                 

;WITH CTE_1 AS (                 -- Multiple CTEs syntax
SELECT ...
FROM Table1),     -- Comma here allows a second CTE start next.
CTE_2 AS (
SELECT ...
FROM CTE_1)
<outer query with references to CTE_1 and/or CTE_2>
;                 -- Semicolon at the end terminates CTEs operation. 

13x.pngRecall the example 3.1 (from inline single derived table section) above where it was returning Products with two maximum UnitPrice in each category? Here we essentially producing the same result but using with a CTE. For T-SQL to build temp table #t see http://www.experts-exchange.com/questions/28688390/How-to-query-using-derived-table-and-multi-CTEs.html


Product  ProdType                                    -- Inline single derived table subquery, example 4.1
A	      x
B	      x
B	      xx
C	      x
D	      x
D	      xx
E	      x 
-- Problem statement: Starting with above data, produce the following (ProdType =x):
   (we can do this easily with a simple WHERE clause but the following is to demo CTE use) 
ProdType  ProdType
B	      xx
D	      xx

;With d
as
(
Select row_number()  over(partition by Product order by ProdType) As rn
, Product
, ProdType
From #t)
Select d.Product, d.ProdType
From d
Where d.rn =2;

14x.pngTo have multiple CTEs, end the first with a comma and start the next CET. There is no need to repeat another WITH keyword because the first one will be is not terminated yet. For T-SQL to build temp table #t see http://www.experts-exchange.com/questions/28688390/How-to-query-using-derived-table-and-multi-CTEs.html


Product  ProdType                                    -- Inline single derived table subquery, example 4.2
A	      x
B	      x
B	      xx
C	      x
D	      x
D	      xx
E	      x 
-- Problem statement: Starting with above data, produce the following (ignore single Prod. Types):
ProdType  ProdType
B	      x
B	      xx
D	      x
D	      xx

;With d
as
(
Select row_number()  over(partition by Product order by ProdType) As rn
, Product
From #t),
e 
as
(
Select #t.Product, ProdType, d.rn 
From #t inner join d on d.Product = #t.Product
Where d.rn >=2)
Select e.Product, e.ProdType
From e;

15x.pngRecursive CTE: “A common table expression (CTE) provides the significant advantage of being able to reference itself, thereby creating a recursive CTE. A recursive CTE is one in which an initial CTE is repeatedly executed to return subsets of data until the complete result set is obtained.

A query is referred to as a recursive query when it references a recursive CTE. Returning hierarchical data is a common use of recursive queries, for example: Displaying employees in an organizational chart, or data in a bill of materials scenario in which a parent product has one or more components and those components may, in turn, have subcomponents or may be components of other parents.

A recursive CTE can greatly simplify the code required to run a recursive query within a SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement.” This description is from https://technet.microsoft.com/en-us/library/ms186243%28v=sql.105%29.aspx

 

In earlier versions prior to SQL Server 2005, a recursive query use of temp tables, cursors, and additional coding was required. Here, in the example below an anchor subquery (starting subquery which fires only once) is followed by CTE body subquery with a UNION ALL operator before it. It is possible to have multiple UNION ALL + body subquery combination following the first one all separated by a comma. At least one of the body subquery is required to reference the CTE name defined in the anchor query.

 

In this example, employee 7 is returned by the anchor query before executing the body subquery repeatedly joining previous rows.

-- Problem statement: Write a query to return the management chain from EmploeeID = 7. 

;WITH BossCTE                                        -- Recursive CTE, example 4.3
AS             
(
SELECT EmployeeID
     , ReportsTo
     , firstname
     , lastname
     , 0 AS distance
FROM dbo.Employees 
WHERE EmployeeID=7
UNION ALL
SELECT M.EmployeeID
     , M.ReportsTo
     , M.firstname
     , M.lastname
     , S.distance + 1 AS distance
FROM BossCTE AS S JOIN dbo.Employees AS M
ON S.ReportsTo = M.EmployeeID
) 
SELECT EmployeeID
     , ReportsTo
     , firstname
     , lastname
     , distance
FROM BossCTE
Order By distance DESC;

CTEs and derived tables are created and used in code, visible only in the scope of the statement that defines them. After that statement terminates, the table expression is gone thus is not, as we discussed in the previous sections, derived tables and CTEs are table expressions that are visible only in the scope of the statement that defines them. Hence, derived tables and CTEs are not reusable.

What we have discussed so far is for use in the code being executed after which when it runs out of scope, it is of no use. In the following section we will see how they could be saved as database object for reusability. 

19x.pngDatabase-object(Views and Inline Table-Valued Functions): For reusability, you need to store the definition of the table expression as an object in the database, and for this you can use either views or inline table-valued functions. Because these are objects in the database, you can control access by using permissions. The main difference between views and inline table-valued functions is that the former doesn't accept input parameters and the latter does. As an example, suppose you need to persist the definition of the query with the row number computation from the examples in the previous sections. To achieve this, you create the following view.


17x.pngA view is some saved TSQL to create a virtual table based on the saved definition. The input column names used in views must exist in the other tables in the database. The view, saved object, in the database is reusable whenever needed provided the users have necessary permission to use them. The result set return by views are relational*  (see below on use of Order By clause with relational subqueries). Additionally, unlike functions, views do not accept parameters.

 

--Problem statement: Write a view to show products with the same ListPrice in  more than three products:

create VIEW dbo.ProductsSameListPriceOverTwoCounts
AS                                                   -- View, example 5.1
select ProductID
     , ProductName
     , UnitPrice
from (
  select *, cnt = count(*) over (partition by unitprice)
  from dbo.Products
) d where cnt > 3;
 
-- Test:
select * from ProductsSameListPriceOverTwoCounts
--Returns:
ProductID  ProductName                   UnitPrice

67     Laughing Lumberjack Lager          14.00
42     Singaporean Hokkien Fried Mee      14.00
34     Sasquatch Ale                      14.00
25     NuNuCa Nuß-Nougat-Creme            14.00
1      Chai                               18.00
35     Steeleye Stout                     18.00
39     Chartreuse verte                   18.00
76     Lakkalikööri                       18.00

* The term relation often is mixed up with term relationship which has to do with joining primary keys and foreign keys. However in standard SQL “… a relation, as originally defined by E. F. Codd, is a set of tuples (d1, d2, ..., dn), where each element dj is a member of Dj, a data domain”. SQL Server, Oracle, MySQL etc. are different flavors of the standard SQL These languages use terms rows and columns in place tuples and domains. As part of a relation defined by standard SQL, relations cannot have Order By clause to sort rows. The exception to use Order By applies in a subquery, the use of ORDER BY with a TOP or OFFSET-FETCH to produce certain results not ordering the data it returns. 


 Inline table-valued functions are like views but accept parameters and you may include Order By clause in its definition. Unlike the previous example where constant 3 was hard coded, with inline table-valued function we can include an input parameters like @RepeadedListPriceCoun to enable user to use different values.


-- Problem statement: Write an Inline table-valued function to show products with the same ListPrice 
-- if equal or over RepeadedListPriceCoun provided:

Create Function dbo.fnProductsSameListPriceOver(
@RepeadedListPriceCount As Int) Returns Table
AS                                                   -- Inline table-valued function , example 5.2
Return
select ProductID
     , ProductName
     , UnitPrice
From (
  Select *, cnt = Count(*) over (Partition Yy unitprice) 
  From dbo.Products
) d Where cnt >= @RepeadedListPriceCount;

-- Test:
Select * From fnProductsSameListPriceOver(4);

-- Returns 25 records:
ProductID  ProductName             UnitPrice
47	Zaanse koeken	                9.50
45	Rogede sild	                9.50
21	Sir Rodney's Scones	       10.00
3	Aniseed Syrup	               10.00
74	Longlife Tofu	               10.00
.
.
56	Gnocchi di nonna Alice	       38.00
12	Queso Manchego La Pastora      38.00
27	Schoggi Schokolade	       43.90
63	Vegie-spread	               43.90

 



Summary: The readers of this article fall in three categories of pro, intermediate, or beginner. If you are a pro, any correction or improvement from will be greately appreciated. For intermediate-level professionals, I hope this presentation is helping to form a complete mental picture and is making easier your work with subqueries by recognizing 5 distinct mode of there applications 1) Self-contained, 2) Correlated, 3) Derived table, 4) CTEs, and 5) View & inline table-valued Functions. My recommendation for the beginner SQL enthusiasts is to read the SQL related questions present at EE (answered or not, without going through solution provided by others, if already answered) to come up with your own solution. This is a good mental gym for your SQL brain muscles. In a matter of a few months you will see how easily you are able to suggest a good solution for a questions which presently maybe a bit mind boggling.


Thank you,


Mike Eghtebas, BSME, PE

 





piechartY.pngSkill Highlights:

 

  • Extensive data analysis experience, dynamic reporting and developing automation tools to streamline activities and boost productivity.
  • Expertise designing user friendly interfaces, experienced in T-SQL, ASP.Net, C#, VB.Net, ADO.Net, VBA, Excel Macros, and VBA coding.
  • Others: Git, MySQL, Java, PHP
     






5
5,224 Views

Comments (1)

Commented:
Thank you Mike, this really clears a lot of key concepts :)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.