<

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x

UPDATES with JOIN... for everybody?

Published on
88,052 Points
13,652 Views
24 Endorsements
Last Modified:
Awarded
Editor's Choice
Community Pick

0. Introduction


One of the regular tasks for a SQL Developer (or DBA) is to UPDATE data of existing rows, after they are in the table. A simple example, to update an existing row, based on the primary key, would look like this:
UPDATE tbl_employee 
   SET last_name = 'whatever' 
 WHERE ssn = '123-456-789';

Open in new window

Easy enough, and the same syntax works for all database engines the same.

But in this article, we will check to see what the syntax is, and what errors newcomers run into, when the data has to come from another table. I will presume that you are familar with the concept of JOIN; however, you can read up on that topic here: en.wikipedia.org/wiki/Join_(SQL).

I will show the syntax for MS SQL Server, MySQL and Oracle, which will differ substantially!

Obviously, we need to set up 2 tables. I propose a common example of a staging table (when importing data) and the real production table (where the data will finally go to).

note: using staging table(s) when importing data from external sources is highly recommended, as it solves a number of quite important problems, which I won't discuss here, though.

Here comes the code to create the 2 tables in MS SQL Server syntax (I attached also the files for the same table for MySQL and Oracle):
CREATE TABLE tbl_Employees 
( pk int identity(1000,1)
, last_name varchar(100)
, first_name varchar(100)
, dob datetime, SSN varchar(50)
, comments varchar(500) );

CREATE TABLE stage_Employees 
( row_id int identity(1,1)
, last_name varchar(100)
, first_name varchar(100)
, dob varchar(100)
, SSN varchar(50)
, comments varchar(500) );

Open in new window

You will notice that the dob field is varchar in the staging table, and not datetime. This is to avoid that the loading from the external source fails with conversion errors (although not relevant here).

We will presume for this article that the external data would have loaded the data in the format 'YYYY-MM-DD'.

important: The fields PK and row_id in the 2 tables are NOT the same thing! They are just auto-generated values, and serve to identify the rows. The real "key" in these tables (at least for us mere humans) is the SSN field.

For this example, we will just load the data with plain inserts (again, here the ms sql server version, in the attached files the adjusted code for MySQL and Oracle).
INSERT INTO tbl_Employees (last_name, first_name, dob, SSN, comments) 
VALUES ( 'Smith', 'Victor'
  , convert(datetime, '1980-05-01',120), '123-456-789', null );
INSERT INTO tbl_Employees (last_name, first_name, dob, SSN, comments)
VALUES ( 'Bond', 'James'
  , convert(datetime, '1978-01-01', 120), '007-007-007', null );
INSERT INTO tbl_Employees (last_name, first_name, dob, SSN, comments)
VALUES ( 'Brown', 'Alphonse'
  , convert(datetime, '1982-12-06', 120), '789-456-123', null );

Open in new window

INSERT INTO stage_Employees (last_name, first_name, dob, SSN, comments) 
VALUES ( 'Smith', 'Victor'
   , '1980-01-05', '123-456-789', 'dob updated' );
INSERT INTO stage_Employees (last_name, first_name, dob, SSN, comments)
VALUES ( 'Brown', 'Alfonse'
   , '1982-06-12', '789-456-123', 'dob and firstname updated' );

Open in new window

It's very important to remember that for this example, we will consider the SSN as the "primary key", so it must never change for an existing record. If you have a "primary key" that can change in the lifetime of the record, get another primary key. In the context of loading data from an external source, you have to get get the actual primary key of the external source also, and store it in your table.

Ok, so the tables are ready, and the staging table has some updates in the data for 2 of the 3 records.

1. First attempt, and getting nothing but errors...


Everyone is a beginner, at some point, with a new "language". So, often enough, one may find people "complaining" and wondering why the following 2 "SQL" statements won't work, respectively not understanding the associated error message.
update tbl_employees
   set last_name = stage_employees.last_name;

Open in new window

update tbl_employees
   set last_name = stage_employees.last_name
 where tbl_employees.ssn = stage_employees.ssn;

Open in new window

In MS SQL Server, both will fail with this:
Msg 107, Level 16, State 3, Line 1
The column prefix 'stage_employees' does not match with a table name or alias name used in the query.


MySQL complains with almost the same response:
MySQL Database Error: Unknown table 'stage_employees' in where clause

For Oracle, the message indicates the full identifier:
SQL Error: ORA-00904: "STAGE_EMPLOYEES"."SSN": invalid identifier
00904. 00000 -  "%s: invalid identifier"


But in all 3 cases, the users will ask themselves (or someone else):
come on, that table (and column) name(s) DO exist, I double-checked!?

The error message actually tells them, though. In layman's terms: the table stage_employees is never "introduced" in a FROM or JOIN clause in the query. Let's immediately proceed to getting the syntax right.

2. Valid syntax


MS SQL Server will accept this:
(http://msdn.microsoft.com/en-us/library/aa260662(SQL.80).aspx)
update tbl_employees
   set last_name = stage_employees.last_name
     , first_name = stage_employees.first_name
  from tbl_employees
  join stage_employees
    on tbl_employees.ssn = stage_employees.ssn

Open in new window

At this point, I will show the same sql, using table aliases, which will not only shorted your code, but make it even more readable:
update t
   set last_name = s.last_name
     , first_name = s.first_name
  from tbl_employees t
  join stage_employees s
    on t.ssn = s.ssn

Open in new window

With that syntax, the reply from the sql server will be 2 rows updated.

MySQL, however, will not accept the same syntax, but the syntax is not that much different: :
(http://dev.mysql.com/doc/refman/5.0/en/update.html)
update tbl_employees t, stage_employees s
   set t.last_name = s.last_name
     , t.last_name = s.last_name
 where t.ssn = s.ssn

Open in new window

...and you will get 2 rows updated.

To make Oracle happy, it's an even more special syntax.
(http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14261/update_statement.htm
or http://www.psoug.org/reference/update.html)
Let's make the first attempt:
update tbl_employees t
   set ( last_name , first_name ) = 
       ( select s.last_name , s.first_name              
            from stage_employees s
              where s.ssn = t.ssn
       )

Open in new window

Oracle will promptly return: 3 rows updated! Errr, wait? 3 rows? ROLLBACK, please!

Yes, oracle will, with that syntax, update all 3 rows from tbl_employees, because unlike MS SQL Server and MySQL, the joining condition is only in a subquery, and not in the actual join. If you query the table before the rollback, you will see last_name and first_name have been set to NULL for the 3rd record!

Remember that, and also how to fix this.

Oracle has several methods to preform this update correctly. Let's start with the simplest modification:
update tbl_employees t
   set ( last_name, first_name ) = 
       ( select ( s.last_name, s.first_name  )
           from stage_employees s
           where s.ssn = t.ssn 
       )
 where exists( select null from stage_employees s
              where s.ssn = t.ssn )

Open in new window

We simply added a condition to the main query to check if actually there is a record in the other table.

Easy enough, but the database engines have all more options to get this done more efficiently.

3. Other options (advanced coding, but very powerful)


Oracle can allow the following syntax, but with the requirement that the SSN field(s) is a primary key or has unique index/constraint.
update ( select t.last_name, s.last_name new_last_name
                , t.first_name, s.first_name new_first_name
           FROM tbl_employees t
           JOIN stage_employees s
             ON t.ssn = s.ssn
        )
   set last_name = new_last_name
     , first_name = new_first_name

Open in new window

So, if your tables have those primary keys on SSN, then you will now get 2 rows updated.

Otherwise, this dreadful message pops up: ORA-01779: cannot modify a column which maps to a non key-preserved table.

To finish with Oracle, there is another, even more powerful syntax possible to perform our update:
(MERGE - http://www.psoug.org/reference/merge.html)
merge into tbl_employees t
using ( select * from stage_employees ) s
  on ( s.ssn = t.ssn )
when matched 
then update set t.last_name = s.last_name
when not matched 
then insert ( t.ssn, t.last_name) values ( s.ssn, t.last_name )
;

Open in new window

This time, you will get excepted 2 rows merged.

The power of this syntax (also know under the term UPSERT) is that it can not only update existing rows, but at the same time, insert new rows (if any), in the when not matched part. I will not go deeper into the syntax (the url above should be just fine for further reading), but you should get used to this when loading from staging tables in Oracle. At least for me, it outperformed the plain UPDATE every time.

What about MySQL, does it have MERGE? Yes, there are even 2 versions, but under different names, different syntax, and different options:

* REPLACE - http://dev.mysql.com/doc/refman/5.0/en/replace.html
* INSERT - http://dev.mysql.com/doc/refman/5.0/en/replace.html

Let me show the INSERT version:
INSERT INTO tbl_employees (ssn, last_name, first_name)
SELECT ssn, last_name, first_name FROM stage_employees
    ON DUPLICATE KEY 
UPDATE last_name=VALUES(last_name)
       , first_name=VALUES(first_name);

Open in new window

Result shall be 2 rows affected.

The main issue with those MySQL features is that it takes implicitly the primary key (or defaulting to a unique index), so you have actually no choice on the joining condition.

Finally, back to MS SQL Server, does it also have a MERGE statement?
the answer depends on the version:
Sql server 2008: yes: http://technet.microsoft.com/en-us/library/bb510625.aspx
Sql server 2005 and lower: no.

Though, with sql 2005 (or higher), you can work with the OUTPUT clause on the UPDATE to find out which rows have been updated:
(http://technet.microsoft.com/en-us/library/ms177564.aspx)
-- this table will get filled with the SSN used in the update:
declare @updated_ids table( ssn varchar(50) )
-- do the normal update (on rows with exiting join match)
update t
   set last_name = s.last_name
-- this line will do the insert of the processed SSN into the above table
output inserted.ssn into @updated_ids 
   from tbl_employees t
   join stage_employees s
      on t.ssn = s.ssn
-- now, we can use that temp table data
insert into tbl_employees ( ssn, last_name )
select s.ssn, s.last_name
  from stage_employees s
 where not exists (select null from @updated_ids t where t.ssn = s.ssn );

Open in new window

MS SQL Server 2005 (and higher) has also another possibility to do this, using CTE (Common Table Expression):
(http://technet.microsoft.com/en-us/library/ms177523.aspx)
--prepare a inline view with all the data needed for the update
;WITH x as ( SELECT t.ssn, t.last_name, t.first_name
   , s.last_name new_last_name, s.first_name new_first_name
   From tbl_employees t
   Join stage_employees s
      On s.ssn = t.ssn )
-- run the update using that inline view
UPDATE x
   SET last_name = new_last_name
     , first_name = new_first_name
  FROM x

Open in new window

I leave it to your evaluation and personal preferences on which option to choose in the database system you work with.

4. Another DON'T


Let's take a look at this code:
UPDATE tbl_employees
    Set last_name = ( select last_name from stage_employees 
                 where tbl_employees.ssn = stage_employees.ssn )
      , first_name = ( select first_name from stage_employees 
                 where tbl_employees.ssn = stage_employees.ssn );

Open in new window

It will run, as there is no syntax error; but it has several issues:

* it will have to run 2 (sub)queries on stage_employees, instead of doing it once. Check out the explain plan for that syntax.
* the same issue as shown in the first attempt for oracle: it will update all 3 rows from tbl_employees, and make the fields NULL where there is no match. Of course, the same "fix" with the EXISTS condition can be applied, but it will add another subquery to the execution.
* if ever the staging table had 2 rows with the same joining condition (in our example, the SSN field), the code would return an error saying "subquery returned more than 1 row". In our case, SSN is a primary key, but remember this error message when you update based on a condition that is not on a primary/unique key.

Please do not write such code!

5. Conclusion


I hope I did not lose you on step 3 with a bit more advanced syntax, but those options are important to have in the tool-belt for any SQL developer. Without them, you will fail to produce efficient code when doing large updates over multiple tables.
MS-SQL-Server.txt
MySQL.txt
Oracle.txt
24
  • 4
  • 4
  • 2
  • +4
14 Comments
LVL 61

Expert Comment

by:Kevin Cross
Thank you, sir!
Another good read and useful tool to add to the arsenal.

Voted yes above.
0
LVL 74

Expert Comment

by:Qlemo
Somebody had to write this, it was about time!
Now we need the same for SQLBase, SQL Anywhere, and - most important - Access :-D
0
LVL 32

Expert Comment

by:Daniel Wilson
Very nice ... another good one for reference.
0
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

LVL 143

Author Comment

by:Guy Hengel [angelIII / a3]
NOTE: for MS SQL Server, the TEXT data type is a problem: you CANNOT "copy" it with a plain update.
you need to use the "good old" UPDATETEXT stuff:
http://sequelserver.blogspot.com/2006/03/update-text1-text2-text3.html
0
LVL 32

Expert Comment

by:Daniel Wilson
The TEXT data type is a problem in a LOT of ways!  varchar(max) is the only argument one should need to upgrade to SQL Server 2005 or later.
0
LVL 32

Expert Comment

by:Daniel Wilson
>>note: using staging table(s) when importing data from external sources is highly recommended, as it solves a number of quite important problems, which I won't discuss here, though.

Yes!  That's one I learned the hard way several years back.
0
LVL 143

Author Comment

by:Guy Hengel [angelIII / a3]
NOTE: for MySQL, there is another problem to note: it does not allow to update with join/subquery from the same table, at least not directly. the workaround is to create a temp table with the data needed for the subsequent join for the update.
0
LVL 143

Author Comment

by:Guy Hengel [angelIII / a3]
I don't plan on making my articles to include ms access in general, so far, but for the UPDATE, I will show the general syntax, along with where the WHERE clause would go:
UPDATE tbl_Employees e 
INNER JOIN stage_Employees s ON s.ssn = e.ssn
  SET e.last_name = s.last_name
    , e.first_name = s.first_name
WHERE s.ssn LIKE "123*"

Open in new window

0
LVL 22

Expert Comment

by:dportas
Great summary Angel.

One important point you didn't mention. Regarding the first example in part 2 : the MS UPDATE ... JOIN syntax. In the example the join criteria is unique because the SSN is unique in Stage. However, if the join criteria was NOT unique and multiple rows existed in the Stage table for any SSN then the values from some *random* one of those rows would be used to update the name columns. SQL Server (unlike Oracle and other DBMSs) will not return an error message and you may not notice straight away that anything unexpected happened!

That is a point a lot of people miss, so take care. If you are using SQL Server 2008 then I recommend you avoid the UPDATE...JOIN syntax altogether. Use MERGE instead. It does not suffer from the same problem and is very often more efficient than the nearest equivalent UPDATE.
0
LVL 143

Author Comment

by:Guy Hengel [angelIII / a3]
I learned something new in MS SQL Server today ...

the UPDATE / JOIN does not need the destination table in the FROM part, actually...

create table t1 ( id int, data varchar(20))
go
create table t2 ( id int, data varchar(20))
go
insert into t1 values ( 1, 'original data 1')
insert into t1 values ( 2, 'original data 2')
insert into t2 values ( 1, 'new data 1')
insert into t2 values ( 3, 'new data 3')
go
update t1
  set t1.data = t2.data
 from t2 
where t2.id = t1.id
go
select * from t1
select * from t2
go
drop table t1
go
drop table t2

Open in new window


I could not believe it until I tested it, and it works in MS SQL 2000, MS SQL 2005 and MS SQL 2008
0
LVL 32

Expert Comment

by:Daniel Wilson
Wow!  Cool!
0
LVL 74

Expert Comment

by:Qlemo
You did not know that? I did :P. However, in most cases you want to use an alias, and hence are forced to provide the source table in FROM.
0

Expert Comment

by:RIAS
Very useful!!!Many thanks
0
LVL 32

Expert Comment

by:Ephraim Wangoya
Very nice article
thanks
0

Featured Post

Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

This is a high-level webinar that covers the history of enterprise open source database use. It addresses both the advantages companies see in using open source database technologies, as well as the fears and reservations they might have. In this…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month