<

SQL Server: Table Joins Explained!

Published on
27,481 Points
10,081 Views
19 Endorsements
Last Modified:
Awarded
Community Pick
A Brief Primer on Database Systems
Relational Database Management Systems (RDBMS) such as SQL Server are based on the idea of designing a logical system composed of entities and relationships among those entities.   This logical system is sometimes referred to as a “universe of discourse” because the logical model is in itself a modeled universe.  The base entity in such a universe is a table object.  Tables typically represent a real-world object or action, such as a table that holds Sales related data.  Other tables in this logical system may be SalesRepresentatives and Products.  A Sale in such a system may be comprised of a purchase of a particular product on a particular date sold by a sales representative.  The relationship among the database tables mimics a real-world process.  
Database tables are composed of attributes, usually referred to as fields, constraints which set rules on the data allowed in a particular table or attribute, and the relationships among tables based on attributes, which can be enforced through foreign key constraints (otherwise known as referential integrity).  It is the relationship among different fields in different tables that I’ll be focusing on today.

What are Joins?  
So, now that I’ve got the fun background of database systems out of the way we can clear the way to what joins are, why we use them, and how they work.
In the SQL Server world there are two types of joins; logical joins and physical joins.  Logical joins are the types of joins you’re likely most familiar with, such as INNER JOIN, OUTER JOIN, etc.  We use these types of joins when we are joining common fields among tables.  We use this syntax to tell the SQL Server what data we want returned without worrying about the physical representation of the join between the tables; we let the database engine figure that out for us.  In 99% of circumstances, this is what we want.  We know what data we want to return, and we want SQL Server to figure out how to return it.  On the other hand, physical joins are the actual internal mechanism that SQL Server uses to retrieve and return common data between tables.  I’ll cover more about physical joins in a future article.  

Join Types
SQL Server offers three types of logical JOINs; INNER JOIN, OUTER JOIN, and CROSS JOIN.  I’ll detail these join types below.

INNER JOIN
An INNER JOIN returns values between tables where one or more fields from one table "match" one or more fields from another table based on some type of comparison operator, such as "=". The syntax of equality INNER JOINs may be represented two different ways.  First, I'll show the ANSI join syntax:

SELECT  sr.FirstName, sr.LastName
FROM dbo.Sales s
INNER JOIN dbo.SalesRepresentatives sr ON s.SalesRepresentativeId =  sr.SalesRepresentativeID
WHERE s.SaleDate >= CONVERT(DATETIME, '3/1/2010')

Open in new window


This syntax returns values from the Sales table where the value in the SalesRepresentativeID field also exists in the SalesRepresentatives table where the sale was made on or after March 1, 2010.  Note that this query only returns records where the SalesRepresentativeID value matches exactly between the two tables.  If there are duplicates between the two tables, duplicate values will also be returned.  Please read this article for handling duplicate values:  http://sqlservernation.com/blogs/tipweek/archive/2009/02/23/how-to-delete-duplicate-values.aspx
The alternative syntax for an INNER JOIN between SQL Server tables is using the WHERE clause.  The equality outcome is the same as the INNER JOIN example above, but this syntax is limited in SQL Server.  It provides no facility for OUTER JOINs in SQL Server 2005 or later.  In fact, the ability to specify a non-ANSI OUTER JOIN syntax in the WHERE clause, such as *= or =*, as been deprecated from SQL Server 2005 and later.

SELECT  sr.FirstName, sr.LastName
FROM dbo.Sales s, dbo.SalesRepresentative sr 
WHERE 
s.SalesRepresentativeId =  sr.SalesRepresentativeId AND
s.SalesDate >= CONVERT(DATETIME, '3/1/2010')

Open in new window


The returned result is the same, but the criteria is specified in the WHERE clause rather than through the use of an INNER JOIN statement.  I encourage you to use the INNER JOIN (or JOIN) syntax rather than the comparison in the WHERE clause.

Note that an INNER JOIN does not have to be an equality comparison or a comparison based on a single field.  The following query is an example of using a greater-than operator in the INNER JOIN clause to return SalesRepresentatives who have achieved their sales goal for the month of March in 2010.

SELECT *
FROM dbo.MonthlySalesGoals msg
INNER JOIN dbo.SalesRepresentatives sr ON msg.SalesRepresentativeID = sr.SalesRepresentativeID AND sr.SalesMTD > msg.SalesGoal
WHERE
msg.SalesMonth = 3 AND
msg.SalesYear = 2010

Open in new window


The above query can also be written in the following equivalent syntax:

SELECT *
FROM dbo.MonthlySalesGoals msg, 
INNER JOIN dbo.SalesRepresentatives sr ON msg.SalesRepresentativeID = sr.SalesRepresentativeID
WHERE
sr.SalesMTD > msg.SalesGoal AND
msg.SalesMonth = 3 AND
msg.SalesYear = 2010

Open in new window


OUTER JOIN
An OUTER JOIN is a logical join that returns all records from one table and only those records from the related table where the values match based on the comparison operator.  The OUTER JOIN also has ability to return records from both tables in the comparison and only match up related records that match on the join criteria.  This is known as a FULL OUTER JOIN, but more on that later.
OUTER JOINs come in two different flavors; LEFT OUTER and RIGHT OUTER.  The only distinction between the two operations is the order in which the tables appear in the join syntax.  There is no performance benefit to use one instead of the other.  Consider the following query:

SELECT allRecords.*, someRecords.*
FROM dbo.TableWhereIWantToReturnAllRecords allRecords LEFT JOIN 
dbo.TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL someRecords ON allRecords.ID = someRecords.ID

Open in new window


In the example above, I want to return ALL records from the table aptly named TableWhereIWantToReturnAllRecords.  I also want to return the values from the TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL  table, but only if the ID value exactly matches, otherwise a NULL value is returned for all fields in this table.
A RIGHT OUTER join is written in the same manner, but in inverse order.  The previous query can easily be rewritten to be a RIGHT OUTER JOIN:

SELECT allRecords.*, someRecords.*
FROM dbo. TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL someRecords
RIGHT OUTER JOIN dbo.TableWhereIWantToReturnAllRecords allRecords ON allRecords.ID = someRecords.ID

Open in new window


The output is exactly the same, only the JOIN type and posititioning of the tables is different.
Now that the basics of the OUTER JOINS have been established, what if you need to specify criteria on the table that only returns records that match the table that you want all records from, such as the
TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL table from above?  For example, if I want to return all records from the dbo.TableWhereIWantToReturnAllRecords table and the records from the dbo.TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL table that have a SaleMonth of 3?  In such a case, the criteria must be specified in the INNER JOIN statement and NOT in the WHERE clause.  

SELECT allRecords.*, someRecords.*
FROM dbo.TableWhereIWantToReturnAllRecords allRecords LEFT JOIN 
dbo.TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL someRecords ON allRecords.ID = someRecords.ID AND someRecords.SaleMonth = 3

Open in new window


If the criteria is specified in the WHERE clause rather than in the INNER JOIN statement, SQL Server searches for the specific (exact) value.  If the SaleMonth value from the dbo.TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL table happens to be NULL, such as when that record does not match a record in dbo.TableWhereIWantToReturnAllRecords, then the NULL value will be omitted.  Performing such an operation essentially turns the OUTER JOIN into an INNER JOIN.

SELECT allRecords.*, someRecords.*
FROM dbo.TableWhereIWantToReturnAllRecords allRecords LEFT JOIN 
dbo.TableWhereIOnlyWantRecordsThatMatchOtherwiseNULL someRecords ON allRecords.ID = someRecords.ID 
WHERE
	someRecords.SaleMonth = 3

Open in new window


Finding non-matched data
Another useful ability of using an OUTER JOIN is to find records that exist in one table but do not exist in another.  For example, I can use an OUTER JOIN to find records that are in a SalesHistory table that DO NOT exist in a SalesHistoryArchive table.  

SELECT *
FROM dbo.SalesHistory s
LEFT JOIN dbo.SalesHistoryArchive sa ON s.SalesID = sa.SalesID
WHERE sa.SalesID IS NULL

The important concept here is the criteria in the WHERE clause.  I am searching specifically for those records where the SalesHistoryArchive.SalesID field is NULL.  These records do not match to the SalesHistory table, and are otherwise "missing".  Note that this query will return duplicate records if the SalesHistoryArchive table constains duplicate SalesID values.

OUTER JOINs are not the only method to find non-matched data.  In the following example I use a correlated sub-query with a NON EXISTS() clause in the WHERE criteria to find records that exist in SalesHistory but absent from SalesHistoryArchive.  The following queries will not return duplicate values if duplicate keys exist in the SalesHistoryArchive table.

SELECT * 
FROM dbo.SalesHistory s
WHERE NOT EXISTS(SELECT 1 FROM dbo.SalesHistoryArchive sa WHERE s.SaleID = sa.SaleID)

Open in new window


The query below the same as the NOT EXISTS clause above through the use of a NOT IN() statement.

SELECT * 
FROM dbo.SalesHistory s
WHERE SaleID NOT IN(
SELECT SaleID FROM dbo.SalesHistoryArchive sa
)

Open in new window



CROSS JOIN
A CROSS JOIN is essentially a cross-multiplication of all records from each table in the "join", known in the mathematics world as Cartesian product.  Its a join in the sense that you're gathering data from a set of tables, but not on a particular set of field(s).  These join types are particularly good for generating sets of test data or to associate a single record table with all records of a different table.  Lets take a look at a couple of examples.

In the following example, I match up every record in the SalesHistory table with every record in the SalesHistoryArchive table.  The number of records return from the query would be:  (the number of records in SalesHistory) X (the number of records in SalesHistoryArchive).  So, if the Sales table contained 100 records and the SalesHistoryArchive table contained 1000 records, then 100,000 would be returned.

SELECT *
FROM dbo.Sales s CROSS JOIN dbo.SalesHistoryArchive

The above query can also be written as:

SELECT *
FROM dbo.Sales s, dbo.SalesHistoryArchive

These examples really do not really give any insight into the functionality of using a CROSS JOIN.  One handy common use of a CROSS JOIN is to generate a large set of numbers very quickly.  In the following example I CROSS JOIN the sys.columns system view with itself and use the ROW_NUMBER() ranking function to quickly generate a large set of sequential numbers.

SELECT ROW_NUMBER() OVER(ORDER BY NEWID())
FROM sys.columns s1
CROSS JOIN sys.columns s2

You might have noticed in the above example that we are using the same table name in the join. This technique of joining a table back to itself (via any legitimate join method) is sometimes called a "self join". This is possible because I am assigning each table a different alias (listed as s1 and s2). The alias is used instead of the table name and so we essentially use the alias as a unique name for that datasource to access the data inside that table. The individual columns are now accessed via S1.<column name>  ( or S1.* for the first sys.columns datasource) and S2.<column name> (or S2.* for the second sys.columns datasource).

The table alias essentially provides each datasource with a unique name and has a lot of benefits. It means that the query can (by and large) stay the same using the alias and simply change the actual table - for example from "test" to "live" without touching the rest of the query - it is a good habit to get into and will save effort as well as enhance readability. In the following example I return all Managers and the employees that report directly to them by way for a "self join".  You'll notice that to make things less confusing, I alias one instance of the Employees table as 'Worker' and another instance as 'Manager'.

Select Worker.ID as Employee_ID, Worker.Name, Worker.Position, Manager.Name as Reports_To
From Employees as Worker
Inner Join Employees as Manager on Worker.ManagerID = Manager.ID

Back to that cross join on sys.columns. That alias "trick" quickly provides us with a list of sequential integers which can be extremely beneficial and no doubt will find a use somewhere in your T-SQL programming life. For more information on how to quickly generate a numbers table for reporting purposes, see Numbers Table written by Brandon Galerisi.

Recap
I hope you enjoyed this article about SQL Server joins.  We looked at the different types of joins available in SQL Server, along with some different ways to use them.  If you have any questions, feel free to post me a comment and I'll get back to you as soon as possible.  

Happy Querying!
Tim
19
Comment
Author:chapmandew
  • 3
  • 3
  • 2
  • +3
12 Comments
 
LVL 93

Expert Comment

by:Patrick Matthews
Tim,

Terrific article for the beginner!

I liked it that you got in a join expression where you use something other than the = operator.  Doesn't come up often, but when it does, it usually gets a lot of raised eyebrows.  "Wow, I didn't know you could DO that!"

:)

Cheers,

Patrick
0
 

Expert Comment

by:breauxlg
What kind of join produces the following?

Table A:
Name         Number
Joe            1
Jim             2
Jack           3

Table B:
Name         Number
Mike           4
Moe           5
Max           6

Join Query:
Name         Number
Joe            1
Jim             2
Jack           3
Mike           4
Moe           5
Max           6
0
 
LVL 53

Expert Comment

by:Mark Wills
Union  or Union All  ("union" will ingore dupes arising from second table if already in first table, "union all" means include them all).

select [name], [number] from tableA
union
select [name], [number] from tableB
0
Cloud Class® Course: Certified Penetration Testing

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

 
LVL 93

Expert Comment

by:Patrick Matthews
breauxlg,

No join will produce that result.  Instead, use UNION:

SELECT Name, Number FROM TableA
UNION ALL
SELECT Name, Number FROM TableB
UNION ALL
SELECT Name, Number FROM TableC

Patrick
0
 

Expert Comment

by:breauxlg
Can this be represented in Design view in Access? Do you have to use SQL view?
0
 
LVL 53

Expert Comment

by:Mark Wills
SQL view
0
 

Expert Comment

by:jsaacson
Is there any way to join two tables from different servers?
0
 
LVL 53

Expert Comment

by:Mark Wills
Yes, use a linked server and then use the four part table identifier e.g.

select a.*,b.*
from <local table name> a
left outer join <linked server name>.<database name>.<schema name>.<tablename> b on a.<column name> = b.<column name>

Open in new window


obviously the bits between (including) the less than and greater than need to be the appropriate names.

there are some challenges sometimes (e.g. with permissions) and need to make sure the servers can talk to each other (at the operating system level), so, best to read up via MSDN or books online.

But to answer your question, yep, it can happen :)
0
 

Expert Comment

by:jsaacson
This is helpful.

Thanks

Joel
0
 

Expert Comment

by:EAK31640GW
In the part about unions is there a way to only return/ include non matches? Excluding the rows that match

Table A:
Name         Number       Title
Joe            1                  Mr
Jim             2                  Dr
Jack           3                  Mr
Mike           4                 Dr

Table B:
Name         Number       Title
Mike           4                 Mr
Moe           5                 Dr
Max           6                  Mr
Jim             2                  Dr
Jack           3                  Mr


??? Query:
Name         Number       Title
Joe            1                  Mr
Mike           4                 Dr
Mike           4                 Mr
Moe           5                 Dr
Max           6                  Mr
0
 
LVL 93

Expert Comment

by:Patrick Matthews
You can actually do it without unions by using a full outer join.  The script below shows how to do it with union and without it.

CREATE TABLE TableA ([Name] varchar(20), Number int, Title varchar(20))

INSERT INTO TableA ([Name], Number, Title)
SELECT 'Joe', 1, 'Mr' UNION ALL
SELECT 'Jim', 2, 'Dr' UNION ALL
SELECT 'Jack', 3, 'Mr' UNION ALL
SELECT 'Mike', 4, 'Dr'

CREATE TABLE TableB ([Name] varchar(20), Number int, Title varchar(20))

INSERT INTO TableB ([Name], Number, Title)
SELECT 'Mike', 4, 'Mr' UNION ALL
SELECT 'Moe', 5, 'Dr' UNION ALL
SELECT 'Max', 6, 'Mr' UNION ALL
SELECT 'Jim', 2, 'Dr' UNION ALL
SELECT 'Jack', 3, 'Mr'

SELECT COALESCE(a.[Name], b.[Name]) AS [Name], COALESCE(a.Number, b.Number) AS Number, 
    COALESCE(a.Title, b.Title) AS Title
FROM TableA a FULL OUTER JOIN
    TableB b ON a.[Name] = b.[Name] AND a.Number = b.Number AND a.Title = b.Title
WHERE a.[Name] IS NULL OR b.[Name] IS NULL

SELECT a.[Name], a.Number, a.Title
FROM TableA a LEFT JOIN
    TableB b ON a.[Name] = b.[Name] AND a.Number = b.Number AND a.Title = b.Title
WHERE b.[Name] IS NULL
UNION ALL
SELECT b.[Name], b.Number, b.Title
FROM TableA a RIGHT JOIN
    TableB b ON a.[Name] = b.[Name] AND a.Number = b.Number AND a.Title = b.Title
WHERE a.[Name] IS NULL

DROP TABLE TableA
DROP TABLE TableB

Open in new window

0
 
LVL 5

Expert Comment

by:Bob Bender

I know this is an older post, but good material.  One question....

It says here....

The alternative syntax for an INNER JOIN between SQL Server tables is using the WHERE clause.  The equality outcome is the same as the INNER JOIN example above, but this syntax is limited in SQL Server.  


Does use of a JOIN statement over a WHERE statement provide a benefit of the JOIN pulls less data into working memory?  You mention that the results are the same as an earlier example, so I would think No

Bob

0

Featured Post

The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

Join & Write a Comment

Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
SQL Database Recovery Software repairs the MDF & NDF Files, corrupted due to hardware related issues or software related errors. Provides preview of recovered database objects and allows saving in either MSSQL, CSV, HTML or XLS format. Ensures recov…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month