SQL Server: Table Joins Explained!

A Brief Primer on Database Systems
Relational Database Management Systems (RDBMS) such as SQL Server are based on the idea of designing a logical system composed of entities and relationships among those entities. This logical system is sometimes referred to as a “universe of discourse” because the logical model is in itself a modeled universe. The base entity in such a universe is a table object. Tables typically represent a real-world object or action, such as a table that holds Sales related data. Other tables in this logical system may be SalesRepresentatives and Products. A Sale in such a system may be comprised of a purchase of a particular product on a particular date sold by a sales representative. The relationship among the database tables mimics a real-world process.
Database tables are composed of attributes, usually referred to as fields, constraints which set rules on the data allowed in a particular table or attribute, and the relationships among tables based on attributes, which can be enforced through foreign key constraints (otherwise known as referential integrity). It is the relationship among different fields in different tables that I’ll be focusing on today.

What are Joins?
So, now that I’ve got the fun background of database systems out of the way we can clear the way to what joins are, why we use them, and how they work.
In the SQL Server world there are two types of joins; logical joins and physical joins. Logical joins are the types of joins you’re likely most familiar with, such as INNER JOIN, OUTER JOIN, etc. We use these types of joins when we are joining common fields among tables. We use this syntax to tell the SQL Server what data we want returned without worrying about the physical representation of the join between the tables; we let the database engine figure that out for us. In 99% of circumstances, this is what we want. We know what data we want to return, and we want SQL Server to figure out how to return it. On the other hand, physical joins are the actual internal mechanism that SQL Server uses to retrieve and return common data between tables. I’ll cover more about physical joins in a future article.

Join Types
SQL Server offers three types of logical JOINs; INNER JOIN, OUTER JOIN, and CROSS JOIN. I’ll detail these join types below.

INNER JOIN
An INNER JOIN returns values between tables where one or more fields from one table "match" one or more fields from another table based on some type of comparison operator, such as "=". The syntax of equality INNER JOINs may be represented two different ways. First, I'll show the ANSI join syntax:

SELECT  sr.FirstName, sr.LastName
                      FROM dbo.Sales s
                      INNER JOIN dbo.SalesRepresentatives sr ON s.SalesRepresentativeId =  sr.SalesRepresentativeID
                      WHERE s.SaleDate >= CONVERT(DATETIME, '3/1/2010')

Open in new window

This syntax returns values from the Sales table where the value in the SalesRepresentativeID field also exists in the SalesRepresentatives table where the sale was made on or after March 1, 2010. Note that this query only returns records where the SalesRepresentativeID value matches exactly between the two tables. If there are duplicates between the two tables, duplicate values will also be returned. Please read this article for handling duplicate values: http://sqlservernation.com/blogs/tipweek/archive/2009/02/23/how-to-delete-duplicate-values.aspx
The alternative syntax for an INNER JOIN between SQL Server tables is using the WHERE clause. The equality outcome is the same as the INNER JOIN example above, but this syntax is limited in SQL Server. It provides no facility for OUTER JOINs in SQL Server 2005 or later. In fact, the ability to specify a non-ANSI OUTER JOIN syntax in the WHERE clause, such as *= or =*, as been deprecated from SQL Server 2005 and later.

SELECT  sr.FirstName, sr.LastName
                      FROM dbo.Sales s, dbo.SalesRepresentative sr 
                      WHERE 
                      s.SalesRepresentativeId =  sr.SalesRepresentativeId AND
                      s.SalesDate >= CONVERT(DATETIME, '3/1/2010')

Open in new window

The returned result is the same, but the criteria is specified in the WHERE clause rather than through the use of an INNER JOIN statement. I encourage you to use the INNER JOIN (or JOIN) syntax rather than the comparison in the WHERE clause.

Note that an INNER JOIN does not have to be an equality comparison or a comparison based on a single field. The following query is an example of using a greater-than operator in the INNER JOIN clause to return SalesRepresentatives who have achieved their sales goal for the month of March in 2010.

SELECT *
                      FROM dbo.MonthlySalesGoals msg
                      INNER JOIN dbo.SalesRepresentatives sr ON msg.SalesRepresentativeID = sr.SalesRepresentativeID AND sr.SalesMTD > msg.SalesGoal
                      WHERE
                      msg.SalesMonth = 3 AND
                      msg.SalesYear = 2010

Open in new window

The above query can also be written in the following equivalent syntax:

SELECT *
                      FROM dbo.MonthlySalesGoals msg, 
                      INNER JOIN dbo.SalesRepresentatives sr ON msg.SalesRepresentativeID = sr.SalesRepresentativeID
                      WHERE
                      sr.SalesMTD > msg.SalesGoal AND
                      msg.SalesMonth = 3 AND
                      msg.SalesYear = 2010

Open in new window

OUTER JOIN
An OUTER JOIN is a logical join that returns all records from one table and only those records from the related table where the values match based on the comparison operator. The OUTER JOIN also has ability to return records from both tables in the comparison and only match up related records that match on the join criteria. This is known as a FULL OUTER JOIN, but more on that later.
OUTER JOINs come in two different flavors; LEFT OUTER and RIGHT OUTER. The only distinction between the two operations is the order in which the tables appear in the join syntax. There is no performance benefit to use one instead of the other. Consider the following query:

SELECT allRecords.*, someRecords.*
                      FROM dbo.TableWhereIWantToReturnAllRecords allRecords LEFT JOIN 
                      dbo.TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL someRecords ON allRecords.ID = someRecords.ID

Open in new window

In the example above, I want to return ALL records from the table aptly named TableWhereIWantToReturnAllRecords. I also want to return the values from the TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL table, but only if the ID value exactly matches, otherwise a NULL value is returned for all fields in this table.
A RIGHT OUTER join is written in the same manner, but in inverse order. The previous query can easily be rewritten to be a RIGHT OUTER JOIN:

SELECT allRecords.*, someRecords.*
                      FROM dbo. TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL someRecords
                      RIGHT OUTER JOIN dbo.TableWhereIWantToReturnAllRecords allRecords ON allRecords.ID = someRecords.ID

Open in new window

The output is exactly the same, only the JOIN type and posititioning of the tables is different.
Now that the basics of the OUTER JOINS have been established, what if you need to specify criteria on the table that only returns records that match the table that you want all records from, such as the
TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL table from above? For example, if I want to return all records from the dbo.TableWhereIWantToReturnAllRecords table and the records from the dbo.TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL table that have a SaleMonth of 3? In such a case, the criteria must be specified in the INNER JOIN statement and NOT in the WHERE clause.

SELECT allRecords.*, someRecords.*
                      FROM dbo.TableWhereIWantToReturnAllRecords allRecords LEFT JOIN 
                      dbo.TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL someRecords ON allRecords.ID = someRecords.ID AND someRecords.SaleMonth = 3

Open in new window

If the criteria is specified in the WHERE clause rather than in the INNER JOIN statement, SQL Server searches for the specific (exact) value. If the SaleMonth value from the dbo.TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL table happens to be NULL, such as when that record does not match a record in dbo.TableWhereIWantToReturnAllRecords, then the NULL value will be omitted. Performing such an operation essentially turns the OUTER JOIN into an INNER JOIN.

SELECT allRecords.*, someRecords.*
                      FROM dbo.TableWhereIWantToReturnAllRecords allRecords LEFT JOIN 
                      dbo.TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL someRecords ON allRecords.ID = someRecords.ID 
                      WHERE
                      	someRecords.SaleMonth = 3

Open in new window

Finding non-matched data
Another useful ability of using an OUTER JOIN is to find records that exist in one table but do not exist in another. For example, I can use an OUTER JOIN to find records that are in a SalesHistory table that DO NOT exist in a SalesHistoryArchive table.

SELECT *
FROM dbo.SalesHistory s
LEFT JOIN dbo.SalesHistoryArchive sa ON s.SalesID = sa.SalesID
WHERE sa.SalesID IS NULL

The important concept here is the criteria in the WHERE clause. I am searching specifically for those records where the SalesHistoryArchive.SalesID field is NULL. These records do not match to the SalesHistory table, and are otherwise "missing". Note that this query will return duplicate records if the SalesHistoryArchive table constains duplicate SalesID values.

OUTER JOINs are not the only method to find non-matched data. In the following example I use a correlated sub-query with a NON EXISTS() clause in the WHERE criteria to find records that exist in SalesHistory but absent from SalesHistoryArchive. The following queries will not return duplicate values if duplicate keys exist in the SalesHistoryArchive table.

SELECT * 
                      FROM dbo.SalesHistory s
                      WHERE NOT EXISTS(SELECT 1 FROM dbo.SalesHistoryArchive sa WHERE s.SaleID = sa.SaleID)

Open in new window

The query below the same as the NOT EXISTS clause above through the use of a NOT IN() statement.

SELECT * 
                      FROM dbo.SalesHistory s
                      WHERE SaleID NOT IN(
                      SELECT SaleID FROM dbo.SalesHistoryArchive sa
                      )

Open in new window

CROSS JOIN
A CROSS JOIN is essentially a cross-multiplication of all records from each table in the "join", known in the mathematics world as Cartesian product. Its a join in the sense that you're gathering data from a set of tables, but not on a particular set of field(s). These join types are particularly good for generating sets of test data or to associate a single record table with all records of a different table. Lets take a look at a couple of examples.

In the following example, I match up every record in the SalesHistory table with every record in the SalesHistoryArchive table. The number of records return from the query would be: (the number of records in SalesHistory) X (the number of records in SalesHistoryArchive). So, if the Sales table contained 100 records and the SalesHistoryArchive table contained 1000 records, then 100,000 would be returned.

SELECT *
FROM dbo.Sales s CROSS JOIN dbo.SalesHistoryArchive

The above query can also be written as:

SELECT *
FROM dbo.Sales s, dbo.SalesHistoryArchive

These examples really do not really give any insight into the functionality of using a CROSS JOIN. One handy common use of a CROSS JOIN is to generate a large set of numbers very quickly. In the following example I CROSS JOIN the sys.columns system view with itself and use the ROW_NUMBER() ranking function to quickly generate a large set of sequential numbers.

SELECT ROW_NUMBER() OVER(ORDER BY NEWID())
FROM sys.columns s1
CROSS JOIN sys.columns s2

You might have noticed in the above example that we are using the same table name in the join. This technique of joining a table back to itself (via any legitimate join method) is sometimes called a "self join". This is possible because I am assigning each table a different alias (listed as s1 and s2). The alias is used instead of the table name and so we essentially use the alias as a unique name for that datasource to access the data inside that table. The individual columns are now accessed via S1.<column name> ( or S1.* for the first sys.columns datasource) and S2.<column name> (or S2.* for the second sys.columns datasource).

The table alias essentially provides each datasource with a unique name and has a lot of benefits. It means that the query can (by and large) stay the same using the alias and simply change the actual table - for example from "test" to "live" without touching the rest of the query - it is a good habit to get into and will save effort as well as enhance readability. In the following example I return all Managers and the employees that report directly to them by way for a "self join". You'll notice that to make things less confusing, I alias one instance of the Employees table as 'Worker' and another instance as 'Manager'.

Select Worker.ID as Employee_ID, Worker.Name, Worker.Position, Manager.Name as Reports_To
From Employees as Worker
Inner Join Employees as Manager on Worker.ManagerID = Manager.ID

Back to that cross join on sys.columns. That alias "trick" quickly provides us with a list of sequential integers which can be extremely beneficial and no doubt will find a use somewhere in your T-SQL programming life. For more information on how to quickly generate a numbers table for reporting purposes, see Numbers Table written by Brandon Galerisi.

Recap
I hope you enjoyed this article about SQL Server joins. We looked at the different types of joins available in SQL Server, along with some different ways to use them. If you have any questions, feel free to post me a comment and I'll get back to you as soon as possible.

Happy Querying!
Tim

Comments (12)

Mark Wills

Topic Advisor

CERTIFIED EXPERT

Distinguished Expert 2018

Commented: 2011-04-29

Yes, use a linked server and then use the four part table identifier e.g.

select a.*,b.*
from <local table name> a
left outer join <linked server name>.<database name>.<schema name>.<tablename> b on a.<column name> = b.<column name>

obviously the bits between (including) the less than and greater than need to be the appropriate names.

there are some challenges sometimes (e.g. with permissions) and need to make sure the servers can talk to each other (at the operating system level), so, best to read up via MSDN or books online.

But to answer your question, yep, it can happen :)

jsaacson

Commented: 2011-05-02

This is helpful.

Thanks

Joel

EAK31640GW

Commented: 2011-07-11

In the part about unions is there a way to only return/ include non matches? Excluding the rows that match

Table A:
Name Number Title
Joe 1 Mr
Jim 2 Dr
Jack 3 Mr
Mike 4 Dr

Table B:
Name Number Title
Mike 4 Mr
Moe 5 Dr
Max 6 Mr
Jim 2 Dr
Jack 3 Mr

??? Query:
Name Number Title
Joe 1 Mr
Mike 4 Dr
Mike 4 Mr
Moe 5 Dr
Max 6 Mr

Patrick Matthews

Top Expert 2010

You can actually do it without unions by using a full outer join. The script below shows how to do it with union and without it.

CREATE TABLE TableA ([Name] varchar(20), Number int, Title varchar(20))

INSERT INTO TableA ([Name], Number, Title)
SELECT 'Joe', 1, 'Mr' UNION ALL
SELECT 'Jim', 2, 'Dr' UNION ALL
SELECT 'Jack', 3, 'Mr' UNION ALL
SELECT 'Mike', 4, 'Dr'

CREATE TABLE TableB ([Name] varchar(20), Number int, Title varchar(20))

INSERT INTO TableB ([Name], Number, Title)
SELECT 'Mike', 4, 'Mr' UNION ALL
SELECT 'Moe', 5, 'Dr' UNION ALL
SELECT 'Max', 6, 'Mr' UNION ALL
SELECT 'Jim', 2, 'Dr' UNION ALL
SELECT 'Jack', 3, 'Mr'

SELECT COALESCE(a.[Name], b.[Name]) AS [Name], COALESCE(a.Number, b.Number) AS Number, 
    COALESCE(a.Title, b.Title) AS Title
FROM TableA a FULL OUTER JOIN
    TableB b ON a.[Name] = b.[Name] AND a.Number = b.Number AND a.Title = b.Title
WHERE a.[Name] IS NULL OR b.[Name] IS NULL

SELECT a.[Name], a.Number, a.Title
FROM TableA a LEFT JOIN
    TableB b ON a.[Name] = b.[Name] AND a.Number = b.Number AND a.Title = b.Title
WHERE b.[Name] IS NULL
UNION ALL
SELECT b.[Name], b.Number, b.Title
FROM TableA a RIGHT JOIN
    TableB b ON a.[Name] = b.[Name] AND a.Number = b.Number AND a.Title = b.Title
WHERE a.[Name] IS NULL

DROP TABLE TableA
DROP TABLE TableB

Bob Bender

Commented: 2011-10-09

I know this is an older post, but good material. One question....

It says here....

The alternative syntax for an INNER JOIN between SQL Server tables is using the WHERE clause. The equality outcome is the same as the INNER JOIN example above, but this syntax is limited in SQL Server.

Does use of a JOIN statement over a WHERE statement provide a benefit of the JOIN pulls less data into working memory? You mention that the results are the same as an earlier example, so I would think No

Bob