UPDATE Based on Join, nondeterministic example

Quote:
The proprietary T-SQL UPDATE syntax based on joins can be nondeterministic. The statement is nondeterministic when multiple source rows match one target row. Unfortunately, in such a case, SQL Server doesn’t generate an error or even a warning. Instead, SQL Server silently performs a nondeterministic UPDATE where one of the source rows arbitrarily “wins.”

Question: Could you please demonstrate how a nondeterministic update functions (produces bad results)?

You may want to use this sample to explain:
UPDATE C
SET C.postalcode = O.shippostalcode
FROM #MyCustomers AS C
INNER JOIN #MyOrders AS O
ON C.custid = O.custid;

create table #MyCustomers(custid int, postalcode varchar(10));
Insert Into #MyCustomers(custid, postalcode) Values
 (1,	10092)
,(2,	10077);
select * from #MyCustomers;
create table #MyOrders(custid int, shippostalcode varchar(10));
Insert Into #MyOrders(custid, shippostalcode) Values
 (1,	10154)
,(1,	10156)
,(1,	10155)
,(1,	10154)
,(1,	10154)
,(1,	10154)
,(2,	10182)
,(2,	10180)
,(2,	10181)
,(2,	10181);
select * from #MyOrders;

-- despite the fact I have used varchar(10) for the postal code columns, it is accepting integer inputs, I guess it is performing implicit conversion.

Open in new window


---------------
Additional info:
Instead of nonstandard (nondeterministic) UPDATE based on joins, use the standard MERGE statement which generates an error if multiple source rows match one target row, requiring to revise the code to make it deterministic.
This post is not about MERGE.
LVL 34
Mike EghtebasDatabase and Application DeveloperAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

ste5anSenior DeveloperCommented:
E.g.

CREATE TABLE #A ( ID INT, Payload INT );
CREATE TABLE #B
    (
      PayloadOld INT ,
      PayloadNew INT
    );

INSERT  INTO #A
VALUES  ( 1, 1 );

INSERT  INTO #B
VALUES  ( 1, 2 ),
        ( 1, 3 );

UPDATE  A
SET     A.Payload = B.PayloadNew
FROM    #A A
        INNER JOIN #B B ON A.Payload = B.PayloadOld;

SELECT  *
FROM    #A;

UPDATE  #A
SET     Payload = 1;

CREATE CLUSTERED INDEX CIX_B ON #B ( PayloadNew DESC);

UPDATE  A
SET     A.Payload = B.PayloadNew
FROM    #A A
        INNER JOIN #B B ON A.Payload = B.PayloadOld;

SELECT  *
FROM    #A;

DROP TABLE #A;
DROP TABLE #B;

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Mike EghtebasDatabase and Application DeveloperAuthor Commented:
Thank you, here is your solution with some comments added to it:
CREATE TABLE #A ( ID INT, Payload INT );
INSERT  INTO #A VALUES 
( 1, 1 );

CREATE TABLE #B (PayloadOld INT, PayloadNew INT);
INSERT  INTO #B VALUES  
( 1, 2 ), ( 1, 3 );

-- Table #A has:
--  ID    Payload
--   1      1

-- Table #B has:
--  ID    PayloadNew
--   1      2
--   1      3

-- To update Payload with PayloadNew. 
-- the code has option of choosing to update using either PayloadNew = 2 or PayloadNew = 3
-- This choice depends on SQL Server internal optimization logic.
-- This is why join update is known as a nondeterministic update
UPDATE  A
SET     A.Payload = B.PayloadNew
FROM    #A A
        INNER JOIN #B B ON A.Payload = B.PayloadOld;

-- After the above update 
SELECT  * FROM    #A;
-- Table #A has:
--  ID    Payload
--   1      2

-- now restoring it back to Payload = 1
UPDATE  #A SET Payload = 1;
-- we get, for Table #A:
--  ID    Payload
--   1      1:

-- If we add desc index to table #B to force it order shown below: 
CREATE CLUSTERED INDEX CIX_B ON #B ( PayloadNew DESC);

-- Table #B has:
--  ID    PayloadNew
--   1      3      --< because of index, query optimizer will see 3 first.
--   1      2

UPDATE  A
SET     A.Payload = B.PayloadNew
FROM    #A A
        INNER JOIN #B B ON A.Payload = B.PayloadOld;

-- Here after the update above, Table #A has:
--  ID    Payload
--   1      3



DROP TABLE #A;
DROP TABLE #B;

Open in new window

Mike EghtebasDatabase and Application DeveloperAuthor Commented:
Here are some alternate ways of controlling whether it is updated with PayloadNew = 2 or PayloadNew =3 without having to index it.
CROSS APPLY:
UPDATE a
SET a.Payload = D.PayloadNew
FROM #A AS a
CROSS APPLY (SELECT TOP (1) b.PayloadNew
FROM #B AS b
WHERE a.Payload = b.PayloadOld
ORDER BY [b]b.PayloadNew[/b]) AS D;                -- PayloadNew = 2 order by asc and for PayloadNew =3 order by desc

Open in new window


Derived table:
UPDATE D
SET Payload = PayloadNew
FROM (
SELECT Top 1 a.ID, a.Payload
, b.PayloadOld, b.PayloadNew
FROM #A a
INNER JOIN #B b
ON a.ID= b.PayloadOld
Order By b.PayloadNew Desc             -- PayloadNew = 2 order by asc and for PayloadNew =3 order by desc
) AS D;

Open in new window


CTE:
WITH cte AS
(
SELECT a.ID, a.Payload
, b.PayloadOld, b.PayloadNew
FROM #A a
INNER JOIN #B b
ON a.ID= b.PayloadOld
)
UPDATE cte
SET Payload = PayloadNew;

Open in new window

ste5anSenior DeveloperCommented:
The usage of ORDER BY in the CROSS APPLY and derived table example makes the UPDATES deterministic.

Only the CTE is still non-deterministic.

In my initial sample and your CTE sample, you cannot predict the outcome of the UPDATE, because there is absolutely no guarantee, that the optimizer and query execution use a certain row. This means for my sample, while will show under most circumstances the results 2 and 3, that on a different machine the results may be 2 and 2. Or 3 and 2. Or 3 and 3.

Another example would be an UPDATE with some large tempdb usage and parallel execution. Use one tembdb file and rerun it with multiple tempdb files. This will introduce also an unpredictable sort.
Mike EghtebasDatabase and Application DeveloperAuthor Commented:
ste5an,

Thank you for the comment. Yes, CTE canot be ordered unless we include top or row number with it I suppose like:
WITH cte AS
(
SELECT top 1 a.ID, a.Payload
, b.PayloadOld, b.PayloadNew
FROM #A a
INNER JOIN #B b
ON a.ID= b.PayloadOld
Order By b.PayloadNew Asc --/ Desc for PayloadNew = 2/PayloadNew = 3
)
UPDATE cte
SET Payload = PayloadNew;

Open in new window

or:
WITH cte AS
(
SELECT Row_Number() Over(Order By b.PayloadNew Asc) As rn  --/ Desc for 2/ 3
, a.ID, a.Payload
, b.PayloadOld, b.PayloadNew
FROM #A a
INNER JOIN #B b
ON a.ID= b.PayloadOld
)
UPDATE cte
SET Payload = PayloadNew
Where rn = 1

Open in new window


I had left cte solution unfinished because then I couldn't see the way forward.

Mike
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.