Introduction: When a new description on subject like subqueries is read, with various types and each with unique features, often readers try to make a mental picture of the process. Such a mental picture as a point of reference becomes very helpful when one attempts to apply the techniques to solve a problem. This is why the old saying "One Picture is Worth a Thousand Words" holds true.
I have prepared the content in this article with visual presentation followed by descriptions and samples to make the path much easier for those interested in having a good understanding of the world of subqueries and retention of techniques to apply them later. Your comment below and/or indicating if this article was helpful is very much appreciated and could be used to improve my future work.
You may download Northwind.bak to restore in your local SQL Server from https://northwinddatabase.codeplex.com to practice or test most of the examples included in this article. Also, consider printing figure 1 below to keep it handy for reference as you are reading the article.
In T-SQL we can use the results returned from one query in another. The embedded or referencedsubquery (inner query) returns required data to enable the outer query to perform more complex operations compared to a simple select clause. Below is the list of different subqueries discussed later in detail:
1) Self-contained subqueries (embedded).
2) Correlated subqueries (embedded).
3) Derived Table Expressions (embedded).
4) Common Table Expression, CTE (referenced).
5) Views and Inline Table-Valued Functions (referenced).
To reference a Table Expression (or an encapsulated named query ) that is executed prior to its use at referenced points in an outer query and/ or in an intermediate table expression. The major advantage of using a named table expression is to facilitate recursive operation and to be referenced more than once in a query operation.
* Inline table-valued FUNCTIONS return a table data type, the other types of functions not related to the discussion here are scalar FUNCTIONS and Built-in functions. Scalar functions return a single data value and Built-in functions come with SQL Server to facilitate various operations.
A scalar subquery, by definition, returns a single value to be used in its outer query either as predicate in the outer query's WHERE clause or included directly in its outer query's SELECT clause as an aliased column.
-- Problem statement: Show product names for ContactName ='Peter Wilson'. SELECT ProductID , ProductName FROM dbo.Products WHERE SupplierID = (SELECT SupplierID -- scalar subquery in WHERE clause, example 1.1 FROM dbo.Suppliers Where ContactName ='Peter Wilson'); -- Problem statement: Display all products with first contact name from Suppliers table. Not very -- practicle, but it shows how to include a self-containe independent subquery in a SELECT clause. SELECT ProductID , ProductName , (SELECT Top(1) ContactName -- scalar subquery in SELECT clause, example 1.2 FROM dbo.Suppliers) AS FirstContactName FROM dbo.Products;
If a self-contained scalar subquery returns an empty set, its outer query sees it as a null mark. And, if such a subquery by mistake coded returning multiple values will produce the following error upon running the entire query (Outer + Subquery):
Msg 512, Level 16, State 1, Line 23
Subquery returned more than 1 value...
Note: The new and acceptable terminolgy is null or null mark. The term null value, often used, is not correct because by definition null is neither a value nor can have a value.
The multiple value self-contained query is used where a multi-valued result is expected, for example when using the IN predicate. As an example, the following query uses a multi-valued subquery to return products supplied by suppliers from UK. The inner query returns supplierid for suppliers from UK. You can use ... WHERE supplierid NOT IN (SELECT supplierid... 0 to return all others but UK if required.
-- Problem statement: Show product names not associated with ContactName ='Peter Wilson'. SELECT ProductID , ProductName FROM dbo.Products WHERE SupplierID IN (SELECT SupplierID -- Multi-value subquery in WHERE clause, example 1.3 FROM dbo.Suppliers WHERE CompanyName <> 'Peter Wilson') -- There is no practical application for multi-value subquery in SELECT clause.
Correlated subqueries, unlike self-contained subqueries discussed above, this subquery references a column in the outer query. Thus we cannot select a correlated subquery to test run it. Both outer and inner query must run at once.
-- Problem statement: Write a query to return products with the maximum unit price in each category. SELECT ProductName , UnitPrice , CategoryID FROM dbo.Products AS t1 -- Same table correlation subquery, example 2.1 WHERE unitprice = (SELECT MAX(unitprice) FROM dbo.Products AS t2 WHERE t2.categoryid = t1.categoryid);
To produce the expected results, the same table (dbo.Products) is aliased as t1 and t2 to produce two instance of the same table to be used independent of each other in the outer and inner query .
-- Problem statement: Write a query to return CompanyName for OrderDate '1997-01-16'. SELECT CompanyName FROM dbo.Customers AS C -- Different tables correlation subquery, example 2.2 WHERE EXISTS -- exist here returns true or false (SELECT * FROM dbo.Orders AS O WHERE O.CustomerID = C.CustomerID AND O.OrderDate = '1997-01-16'); -- ... WHERE NOT EXISTS(SELECT * FROM … <-- use this when required
a) Derived tables and b) Common table expressions (CTEs) are created and used in code (non-db object).
c) Views and d) Inline table-valued functions are created in code or in SSMS stored as db objects.
Non-database-objects are created and used in code, visible only in the scope of the statement that defines them. When out of scope, the table expression is gone and is not usable. See the first example under section 3 below.
The derived table is a subquery in a FROM clause. The FROM clause must have an aliased name. It is called a derived table because the fields in the outer SELECT clause query come from this named derived table. There are Inline and External derived table subqueries:
In an external single derived table, it is mandatory to list the field name inside () after the FROM table alias name. For T-SQL to build temp table #t see http://www.experts-exchange.com/questions/28688390/How-to-query-using-derived-table-and-multi-CTEs.html
Product ProdType -- Inline single derived table subquery, example 3.1 A x B x B xx C x D x D xx E x -- Problem statement: Starting with above data, produce the following (ignore single Prod. Types): ProdType ProdType B x B xx D x D xx Select Product, ProdType from ( select *, count(*) over (partition by Product) cnt from #t ) d where cnt > 1; -- Note: This example will be used later in discussing CTEs.
The self imposed limitation with derived tables is that we don't use multiple table expressions in a FROM clause whereas with CTEs discussed later we can. Of course, technically speaking, we could nest multiple derived tables as shown below but nesting them in this manner complicates the coding and makes it prone to syntax errors thus rarely used.
-- Multiple derived tables syntax -- (for reasons stated this is possible but not used): SELECT f1, f2, ... FROM (SELECT ... FROM (SELECT ... FROM Ta WHERE ...) AS Tb WHERE ...) AS Tc WHERE ...) As Td (f1, f2, ...);
CTE is like derived table expressions. It's a named table expression that is visible only to the statement that follows it. Like a query against a derived table, CTE involves three main parts:
a) Name the inner query,
b) Name the columns of the inner query (right outside the FROM clause),
c) Name the outer query.
As opposed to derived tables, CTEs are not nested. CTE_1 and CTE_2 etc. are separated by a comma. Now, CTE_2 can reference CTE_1 because it leads VTE_2. And the outer query can reference both CTE_1 and CTE_2.
;WITH <CTE_a> -- Single CTE syntax AS ( <inner query> ) <outer query> ; -- Semicolon at the end terminates CTE_a operation. ;WITH CTE_1 AS ( -- Multiple CTEs syntax SELECT ... FROM Table1), -- Comma here allows a second CTE start next. CTE_2 AS ( SELECT ... FROM CTE_1) <outer query with references to CTE_1 and/or CTE_2> ; -- Semicolon at the end terminates CTEs operation.
Recall the example 3.1 (from inline single derived table section) above where it was returning Products with two maximum UnitPrice in each category? Here we essentially producing the same result but using with a CTE. For T-SQL to build temp table #t see http://www.experts-exchange.com/questions/28688390/How-to-query-using-derived-table-and-multi-CTEs.html
Product ProdType -- Inline single derived table subquery, example 4.1 A x B x B xx C x D x D xx E x -- Problem statement: Starting with above data, produce the following (ProdType =x): (we can do this easily with a simple WHERE clause but the following is to demo CTE use) ProdType ProdType B xx D xx ;With d as ( Select row_number() over(partition by Product order by ProdType) As rn , Product , ProdType From #t) Select d.Product, d.ProdType From d Where d.rn =2;
To have multiple CTEs, end the first with a comma and start the next CET. There is no need to repeat another WITH keyword because the first one will be is not terminated yet. For T-SQL to build temp table #t see http://www.experts-exchange.com/questions/28688390/How-to-query-using-derived-table-and-multi-CTEs.html
Product ProdType -- Inline single derived table subquery, example 4.2 A x B x B xx C x D x D xx E x -- Problem statement: Starting with above data, produce the following (ignore single Prod. Types): ProdType ProdType B x B xx D x D xx ;With d as ( Select row_number() over(partition by Product order by ProdType) As rn , Product From #t), e as ( Select #t.Product, ProdType, d.rn From #t inner join d on d.Product = #t.Product Where d.rn >=2) Select e.Product, e.ProdType From e;
Recursive CTE: “A common table expression (CTE) provides the significant advantage of being able to reference itself, thereby creating a recursive CTE. A recursive CTE is one in which an initial CTE is repeatedly executed to return subsets of data until the complete result set is obtained.
A query is referred to as a recursive query when it references a recursive CTE. Returning hierarchical data is a common use of recursive queries, for example: Displaying employees in an organizational chart, or data in a bill of materials scenario in which a parent product has one or more components and those components may, in turn, have subcomponents or may be components of other parents.
A recursive CTE can greatly simplify the code required to run a recursive query within a SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement.” This description is from https://technet.microsoft.com/en-us/library/ms186243%28v=sql.105%29.aspx
In earlier versions prior to SQL Server 2005, a recursive query use of temp tables, cursors, and additional coding was required. Here, in the example below an anchor subquery (starting subquery which fires only once) is followed by CTE body subquery with a UNION ALL operator before it. It is possible to have multiple UNION ALL + body subquery combination following the first one all separated by a comma. At least one of the body subquery is required to reference the CTE name defined in the anchor query.
In this example, employee 7 is returned by the anchor query before executing the body subquery repeatedly joining previous rows.
-- Problem statement: Write a query to return the management chain from EmploeeID = 7. ;WITH BossCTE -- Recursive CTE, example 4.3 AS ( SELECT EmployeeID , ReportsTo , firstname , lastname , 0 AS distance FROM dbo.Employees WHERE EmployeeID=7 UNION ALL SELECT M.EmployeeID , M.ReportsTo , M.firstname , M.lastname , S.distance + 1 AS distance FROM BossCTE AS S JOIN dbo.Employees AS M ON S.ReportsTo = M.EmployeeID ) SELECT EmployeeID , ReportsTo , firstname , lastname , distance FROM BossCTE Order By distance DESC;
CTEs and derived tables are created and used in code, visible only in the scope of the statement that defines them. After that statement terminates, the table expression is gone thus is not, as we discussed in the previous sections, derived tables and CTEs are table expressions that are visible only in the scope of the statement that defines them. Hence, derived tables and CTEs are not reusable.
What we have discussed so far is for use in the code being executed after which when it runs out of scope, it is of no use. In the following section we will see how they could be saved as database object for reusability.
Database-object(Views and Inline Table-Valued Functions): For reusability, you need to store the definition of the table expression as an object in the database, and for this you can use either views or inline table-valued functions. Because these are objects in the database, you can control access by using permissions. The main difference between views and inline table-valued functions is that the former doesn't accept input parameters and the latter does. As an example, suppose you need to persist the definition of the query with the row number computation from the examples in the previous sections. To achieve this, you create the following view.
A view is some saved TSQL to create a virtual table based on the saved definition. The input column names used in views must exist in the other tables in the database. The view, saved object, in the database is reusable whenever needed provided the users have necessary permission to use them. The result set return by views are relational* (see below on use of Order By clause with relational subqueries). Additionally, unlike functions, views do not accept parameters.
--Problem statement: Write a view to show products with the same ListPrice in more than three products: create VIEW dbo.ProductsSameListPriceOverTwoCounts AS -- View, example 5.1 select ProductID , ProductName , UnitPrice from ( select *, cnt = count(*) over (partition by unitprice) from dbo.Products ) d where cnt > 3; -- Test: select * from ProductsSameListPriceOverTwoCounts --Returns: ProductID ProductName UnitPrice 67 Laughing Lumberjack Lager 14.00 42 Singaporean Hokkien Fried Mee 14.00 34 Sasquatch Ale 14.00 25 NuNuCa Nuß-Nougat-Creme 14.00 1 Chai 18.00 35 Steeleye Stout 18.00 39 Chartreuse verte 18.00 76 Lakkalikööri 18.00
* The term relation often is mixed up with term relationship which has to do with joining primary keys and foreign keys. However in standard SQL “… a relation, as originally defined by E. F. Codd, is a set of tuples (d1, d2, ..., dn), where each element dj is a member of Dj, a data domain”. SQL Server, Oracle, MySQL etc. are different flavors of the standard SQL These languages use terms rows and columns in place tuples and domains. As part of a relation defined by standard SQL, relations cannot have Order By clause to sort rows. The exception to use Order By applies in a subquery, the use of ORDER BY with a TOP or OFFSET-FETCH to produce certain results not ordering the data it returns.
Inline table-valued functions are like views but accept parameters and you may include Order By clause in its definition. Unlike the previous example where constant 3 was hard coded, with inline table-valued function we can include an input parameters like @RepeadedListPriceCoun to enable user to use different values.
-- Problem statement: Write an Inline table-valued function to show products with the same ListPrice -- if equal or over RepeadedListPriceCoun provided: Create Function dbo.fnProductsSameListPriceOver( @RepeadedListPriceCount As Int) Returns Table AS -- Inline table-valued function , example 5.2 Return select ProductID , ProductName , UnitPrice From ( Select *, cnt = Count(*) over (Partition Yy unitprice) From dbo.Products ) d Where cnt >= @RepeadedListPriceCount; -- Test: Select * From fnProductsSameListPriceOver(4); -- Returns 25 records: ProductID ProductName UnitPrice 47 Zaanse koeken 9.50 45 Rogede sild 9.50 21 Sir Rodney's Scones 10.00 3 Aniseed Syrup 10.00 74 Longlife Tofu 10.00 . . 56 Gnocchi di nonna Alice 38.00 12 Queso Manchego La Pastora 38.00 27 Schoggi Schokolade 43.90 63 Vegie-spread 43.90
Summary: The readers of this article fall in three categories of pro, intermediate, or beginner. If you are a pro, any correction or improvement from will be greately appreciated. For intermediate-level professionals, I hope this presentation is helping to form a complete mental picture and is making easier your work with subqueries by recognizing 5 distinct mode of there applications 1) Self-contained, 2) Correlated, 3) Derived table, 4) CTEs, and 5) View & inline table-valued Functions. My recommendation for the beginner SQL enthusiasts is to read the SQL related questions present at EE (answered or not, without going through solution provided by others, if already answered) to come up with your own solution. This is a good mental gym for your SQL brain muscles. In a matter of a few months you will see how easily you are able to suggest a good solution for a questions which presently maybe a bit mind boggling.
Mike Eghtebas, BSME, PE