Community Pick: Many members of our community have endorsed this article.
Editor's Choice: This article has been selected by our editors as an exceptional contribution.

UPDATES with JOIN... for everybody?

Guy Hengel [angelIII / a3]Billing Engineer

0. Introduction

One of the regular tasks for a SQL Developer (or DBA) is to UPDATE data of existing rows, after they are in the table. A simple example, to update an existing row, based on the primary key, would look like this:
UPDATE tbl_employee 
   SET last_name = 'whatever' 
 WHERE ssn = '123-456-789';

Open in new window

Easy enough, and the same syntax works for all database engines the same.

But in this article, we will check to see what the syntax is, and what errors newcomers run into, when the data has to come from another table. I will presume that you are familar with the concept of JOIN; however, you can read up on that topic here:

I will show the syntax for MS SQL Server, MySQL and Oracle, which will differ substantially!

Obviously, we need to set up 2 tables. I propose a common example of a staging table (when importing data) and the real production table (where the data will finally go to).

note: using staging table(s) when importing data from external sources is highly recommended, as it solves a number of quite important problems, which I won't discuss here, though.

Here comes the code to create the 2 tables in MS SQL Server syntax (I attached also the files for the same table for MySQL and Oracle):
CREATE TABLE tbl_Employees 
( pk int identity(1000,1)
, last_name varchar(100)
, first_name varchar(100)
, dob datetime, SSN varchar(50)
, comments varchar(500) );

CREATE TABLE stage_Employees 
( row_id int identity(1,1)
, last_name varchar(100)
, first_name varchar(100)
, dob varchar(100)
, SSN varchar(50)
, comments varchar(500) );

Open in new window

You will notice that the dob field is varchar in the staging table, and not datetime. This is to avoid that the loading from the external source fails with conversion errors (although not relevant here).

We will presume for this article that the external data would have loaded the data in the format 'YYYY-MM-DD'.

important: The fields PK and row_id in the 2 tables are NOT the same thing! They are just auto-generated values, and serve to identify the rows. The real "key" in these tables (at least for us mere humans) is the SSN field.

For this example, we will just load the data with plain inserts (again, here the ms sql server version, in the attached files the adjusted code for MySQL and Oracle).
INSERT INTO tbl_Employees (last_name, first_name, dob, SSN, comments) 
VALUES ( 'Smith', 'Victor'
  , convert(datetime, '1980-05-01',120), '123-456-789', null );
INSERT INTO tbl_Employees (last_name, first_name, dob, SSN, comments)
VALUES ( 'Bond', 'James'
  , convert(datetime, '1978-01-01', 120), '007-007-007', null );
INSERT INTO tbl_Employees (last_name, first_name, dob, SSN, comments)
VALUES ( 'Brown', 'Alphonse'
  , convert(datetime, '1982-12-06', 120), '789-456-123', null );

Open in new window

INSERT INTO stage_Employees (last_name, first_name, dob, SSN, comments) 
VALUES ( 'Smith', 'Victor'
   , '1980-01-05', '123-456-789', 'dob updated' );
INSERT INTO stage_Employees (last_name, first_name, dob, SSN, comments)
VALUES ( 'Brown', 'Alfonse'
   , '1982-06-12', '789-456-123', 'dob and firstname updated' );

Open in new window

It's very important to remember that for this example, we will consider the SSN as the "primary key", so it must never change for an existing record. If you have a "primary key" that can change in the lifetime of the record, get another primary key. In the context of loading data from an external source, you have to get get the actual primary key of the external source also, and store it in your table.

Ok, so the tables are ready, and the staging table has some updates in the data for 2 of the 3 records.

1. First attempt, and getting nothing but errors...

Everyone is a beginner, at some point, with a new "language". So, often enough, one may find people "complaining" and wondering why the following 2 "SQL" statements won't work, respectively not understanding the associated error message.
update tbl_employees
   set last_name = stage_employees.last_name;

Open in new window

update tbl_employees
   set last_name = stage_employees.last_name
 where tbl_employees.ssn = stage_employees.ssn;

Open in new window

In MS SQL Server, both will fail with this:
Msg 107, Level 16, State 3, Line 1
The column prefix 'stage_employees' does not match with a table name or alias name used in the query.

MySQL complains with almost the same response:
MySQL Database Error: Unknown table 'stage_employees' in where clause

For Oracle, the message indicates the full identifier:
SQL Error: ORA-00904: "STAGE_EMPLOYEES"."SSN": invalid identifier
00904. 00000 -  "%s: invalid identifier"

But in all 3 cases, the users will ask themselves (or someone else):
come on, that table (and column) name(s) DO exist, I double-checked!?

The error message actually tells them, though. In layman's terms: the table stage_employees is never "introduced" in a FROM or JOIN clause in the query. Let's immediately proceed to getting the syntax right.

2. Valid syntax

MS SQL Server will accept this:
update tbl_employees
   set last_name = stage_employees.last_name
     , first_name = stage_employees.first_name
  from tbl_employees
  join stage_employees
    on tbl_employees.ssn = stage_employees.ssn

Open in new window

At this point, I will show the same sql, using table aliases, which will not only shorted your code, but make it even more readable:
update t
   set last_name = s.last_name
     , first_name = s.first_name
  from tbl_employees t
  join stage_employees s
    on t.ssn = s.ssn

Open in new window

With that syntax, the reply from the sql server will be 2 rows updated.

MySQL, however, will not accept the same syntax, but the syntax is not that much different: :
update tbl_employees t, stage_employees s
   set t.last_name = s.last_name
     , t.last_name = s.last_name
 where t.ssn = s.ssn

Open in new window

...and you will get 2 rows updated.

To make Oracle happy, it's an even more special syntax.
Let's make the first attempt:
update tbl_employees t
   set ( last_name , first_name ) = 
       ( select s.last_name , s.first_name              
            from stage_employees s
              where s.ssn = t.ssn

Open in new window

Oracle will promptly return: 3 rows updated! Errr, wait? 3 rows? ROLLBACK, please!

Yes, oracle will, with that syntax, update all 3 rows from tbl_employees, because unlike MS SQL Server and MySQL, the joining condition is only in a subquery, and not in the actual join. If you query the table before the rollback, you will see last_name and first_name have been set to NULL for the 3rd record!

Remember that, and also how to fix this.

Oracle has several methods to preform this update correctly. Let's start with the simplest modification:
update tbl_employees t
   set ( last_name, first_name ) = 
       ( select ( s.last_name, s.first_name  )
           from stage_employees s
           where s.ssn = t.ssn 
 where exists( select null from stage_employees s
              where s.ssn = t.ssn )

Open in new window

We simply added a condition to the main query to check if actually there is a record in the other table.

Easy enough, but the database engines have all more options to get this done more efficiently.

3. Other options (advanced coding, but very powerful)

Oracle can allow the following syntax, but with the requirement that the SSN field(s) is a primary key or has unique index/constraint.
update ( select t.last_name, s.last_name new_last_name
                , t.first_name, s.first_name new_first_name
           FROM tbl_employees t
           JOIN stage_employees s
             ON t.ssn = s.ssn
   set last_name = new_last_name
     , first_name = new_first_name

Open in new window

So, if your tables have those primary keys on SSN, then you will now get 2 rows updated.

Otherwise, this dreadful message pops up: ORA-01779: cannot modify a column which maps to a non key-preserved table.

To finish with Oracle, there is another, even more powerful syntax possible to perform our update:
merge into tbl_employees t
using ( select * from stage_employees ) s
  on ( s.ssn = t.ssn )
when matched 
then update set t.last_name = s.last_name
when not matched 
then insert ( t.ssn, t.last_name) values ( s.ssn, t.last_name )

Open in new window

This time, you will get excepted 2 rows merged.

The power of this syntax (also know under the term UPSERT) is that it can not only update existing rows, but at the same time, insert new rows (if any), in the when not matched part. I will not go deeper into the syntax (the url above should be just fine for further reading), but you should get used to this when loading from staging tables in Oracle. At least for me, it outperformed the plain UPDATE every time.

What about MySQL, does it have MERGE? Yes, there are even 2 versions, but under different names, different syntax, and different options:


Let me show the INSERT version:
INSERT INTO tbl_employees (ssn, last_name, first_name)
SELECT ssn, last_name, first_name FROM stage_employees
UPDATE last_name=VALUES(last_name)
       , first_name=VALUES(first_name);

Open in new window

Result shall be 2 rows affected.

The main issue with those MySQL features is that it takes implicitly the primary key (or defaulting to a unique index), so you have actually no choice on the joining condition.

Finally, back to MS SQL Server, does it also have a MERGE statement?
the answer depends on the version:
Sql server 2008: yes:
Sql server 2005 and lower: no.

Though, with sql 2005 (or higher), you can work with the OUTPUT clause on the UPDATE to find out which rows have been updated:
-- this table will get filled with the SSN used in the update:
declare @updated_ids table( ssn varchar(50) )
-- do the normal update (on rows with exiting join match)
update t
   set last_name = s.last_name
-- this line will do the insert of the processed SSN into the above table
output inserted.ssn into @updated_ids 
   from tbl_employees t
   join stage_employees s
      on t.ssn = s.ssn
-- now, we can use that temp table data
insert into tbl_employees ( ssn, last_name )
select s.ssn, s.last_name
  from stage_employees s
 where not exists (select null from @updated_ids t where t.ssn = s.ssn );

Open in new window

MS SQL Server 2005 (and higher) has also another possibility to do this, using CTE (Common Table Expression):
--prepare a inline view with all the data needed for the update
;WITH x as ( SELECT t.ssn, t.last_name, t.first_name
   , s.last_name new_last_name, s.first_name new_first_name
   From tbl_employees t
   Join stage_employees s
      On s.ssn = t.ssn )
-- run the update using that inline view
   SET last_name = new_last_name
     , first_name = new_first_name
  FROM x

Open in new window

I leave it to your evaluation and personal preferences on which option to choose in the database system you work with.

4. Another DON'T

Let's take a look at this code:
UPDATE tbl_employees
    Set last_name = ( select last_name from stage_employees 
                 where tbl_employees.ssn = stage_employees.ssn )
      , first_name = ( select first_name from stage_employees 
                 where tbl_employees.ssn = stage_employees.ssn );

Open in new window

It will run, as there is no syntax error; but it has several issues:

* it will have to run 2 (sub)queries on stage_employees, instead of doing it once. Check out the explain plan for that syntax.
* the same issue as shown in the first attempt for oracle: it will update all 3 rows from tbl_employees, and make the fields NULL where there is no match. Of course, the same "fix" with the EXISTS condition can be applied, but it will add another subquery to the execution.
* if ever the staging table had 2 rows with the same joining condition (in our example, the SSN field), the code would return an error saying "subquery returned more than 1 row". In our case, SSN is a primary key, but remember this error message when you update based on a condition that is not on a primary/unique key.

Please do not write such code!

5. Conclusion

I hope I did not lose you on step 3 with a bit more advanced syntax, but those options are important to have in the tool-belt for any SQL developer. Without them, you will fail to produce efficient code when doing large updates over multiple tables.
Guy Hengel [angelIII / a3]Billing Engineer

Comments (14)

Guy Hengel [angelIII / a3]Billing Engineer
Most Valuable Expert 2014
Top Expert 2009


I learned something new in MS SQL Server today ...

the UPDATE / JOIN does not need the destination table in the FROM part, actually...

create table t1 ( id int, data varchar(20))
create table t2 ( id int, data varchar(20))
insert into t1 values ( 1, 'original data 1')
insert into t1 values ( 2, 'original data 2')
insert into t2 values ( 1, 'new data 1')
insert into t2 values ( 3, 'new data 3')
update t1
  set =
 from t2 
where =
select * from t1
select * from t2
drop table t1
drop table t2

Open in new window

I could not believe it until I tested it, and it works in MS SQL 2000, MS SQL 2005 and MS SQL 2008
Wow!  Cool!
Qlemo"Batchelor", Developer and EE Topic Advisor
Top Expert 2015

You did not know that? I did :P. However, in most cases you want to use an alias, and hence are forced to provide the source table in FROM.

Very useful!!!Many thanks
Ephraim WangoyaSoftware Engineer

Very nice article

View More

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.

Get access with a 7-day free trial.
You Belong in the World's Smartest IT Community