Link to home
Start Free TrialLog in
Avatar of Steve Synan
Steve Synan

asked on

Trying to write a query that returns records based on the MAX of two columns (where one takes precedence) and another column being NULL (Microsoft SQL Server)

Hello,

I'm trying to write a query with the following rules:

If the MAX ChangeOrder OR MAX Revision has a SyncDate that is null then return the record, however
Change order takes precedence (see TEST5 in desired results)

[Study] <----- table name

ProtocolID	  Revision		ChangeOrder		SyncDate
--------------------------------------------------------------------------
TEST1		0		0			NULL
TEST1		0		1			01/02/2013
TEST1		0		2			NULL
TEST2		0		0			NULL
TEST2		0		1			NULL
TEST2		0		2			01/02/2013
TEST3		0		0			NULL
TEST3		1		0			01/02/2013
TEST3		2		0			NULL
TEST4		0		0			NULL
TEST4		1		0			NULL
TEST4		2		0			01/02/2013
TEST5		0		0			NULL
TEST5		1		0			NULL
TEST5		2		0			01/02/2013
TEST5		2		1			NULL

Open in new window


Desired resultset

ProtocolID    Revision		ChangeOrder		SyncDate
--------------------------------------------------------------------------
TEST1		0		2			NULL
TEST3		2		0			NULL
TEST5		4		1			NULL

Open in new window


TEST1 would return because the record with the MAX ChangeOrder (2) has a SyncDate that is NULL
TEST2 would NOT return because the record with the MAX ChangeOrder (2) has a SyncDate that is NOT NULL
TEST3 would return because the record with the MAX Revision (2) has a SyncDate that is NULL
TEST4 would NOT return because the record with the MAX Revision (2) has a SyncDate that is NOT NULL
TEST5 would return because although the record with the MAX Revision is NOT NULL, the record with the MAX ChangeOrder is NULL
ASKER CERTIFIED SOLUTION
Avatar of Ryan Chong
Ryan Chong
Flag of Singapore image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Steve Synan
Steve Synan

ASKER

Thank you for all your help so far. I'm really close and it's catching the records I want, however there is one rogue record in there that shouldn't be.

My first example was simplified, here is my actual query. I hope this still makes sense as a few things have changed.

My Current Query

;WITH cte AS
(
	SELECT 
		s.[ProtocolID],
		s.[Revision],
		s.[ChangeOrderNumber],
		s.[LastSalesForceSyncDate],
			ROW_NUMBER() OVER (PARTITION BY s.ProtocolID ORDER BY s.[ChangeOrderNumber] DESC) idx 
	FROM
		[dbo].[Study] s
	INNER JOIN [dbo].[StudyFinalization] sf ON sf.[ID] = s.[ID]
	WHERE sf.[Finalized] = 1 AND s.[ProtocolID] NOT LIKE '%TEST%'
) 
SELECT 
	s.[ProtocolID],
	s.[Revision],
	s.[ChangeOrderNumber],
	s.[LastSalesForceSyncDate]
FROM cte s
WHERE s.idx = 1 AND s.[LastSalesForceSyncDate] IS NULL

Open in new window


Highlighted record shouldn't be there
User generated image
I run the following query on my tables

SELECT 
	s.[ProtocolID],
	s.[Revision],
	s.[ChangeOrderNumber],
	s.[LastSalesForceSyncDate],
	sf.[Finalized]
FROM [dbo].[Study] s 
INNER JOIN [dbo].[StudyFinalization] sf ON sf.[ID] = s.[ID]
WHERE s.[ProtocolID] = 'ONFI201801'

Open in new window


And these are the results I see:

User generated image
As you can see, the MAX revision (4) does have a SyncDate (actually called LastSalesForceSyncDate)

All other records look fine, any ideas why this one might be showing up?

Many thanks for helping me!
what if trying:

ROW_NUMBER() OVER (PARTITION BY s.ProtocolID ORDER BY s.[ChangeOrderNumber] DESC, Revision DESC, LastSalesForceSyncDate DESC) idx
You're marvelous! I've been banging my head for the last four hours trying to make this work. I would buy you a pitcher of beer if I could!

Now I just have to study up on the following line:

ROW_NUMBER() OVER (PARTITION BY s.ProtocolID ORDER BY s.[ChangeOrderNumber] DESC,  Revision DESC, LastSalesForceSyncDate DESC) idx

because I'm not familiar with the OVER and PARTITION keywords.

In any case, thank you so much!
OK, two distinct conditions....

Choose  ProtocolID (row) where the max(ChangeOrderNumber) has a NULL [LastSalesForceSyncDate]
ELSE
Choose  ProtocolID (row) where the max(Revision) has a NULL [LastSalesForceSyncDate]
 
In which case, need to check those two disparate conditions.... Then decide.

Does that sound right ?

Dont think the row_number() function will accommodate all instances. Becuse it is simply sorting (essentially) whereby idx = 1 might not be the correct choice.
OOps, too late you have made your choice :)
Ummmm... I could prove that ROW_NUMBER() does not work in all instances.

Plug in TEST6 to Ryan's selected solution

      select 'TEST6',0,0,NULL union
      select 'TEST6',1,1,'20180717' union
      select 'TEST6',2,0,NULL

My understanding is that it should return a result ie  'TEST6',2,0,NULL

But maybe not (and given the examples, I have no idea why it wouldnt). If the above result is true then the row_number() function can never return you the predictable results you need.

Maybe Ryan can test and advise ? Which is a test in itself :)

But would like to discuss the roles of Revision and ChangeOrder - uniqueness, sequence etc...
I'll have to run the test, but in that case it shouldn't return a record because the ChangeOrder has precedence therefor the record with the highest ChangeOrder number ('TEST6', 1, 1, '20180717') doesn't have a null SyncDate there for it shouldn't return.

I think I just worded my question poorly.
>> highest ChangeOrder

I think it is part of my confusion (and mentioned in your other question)

And the clarification I was seeking about the roles/relationships of uniqueness , sequencing. etc.

So, if the above scenario shouldnt return a result because "highest" ChangeOrder then, what if they all have a 1 ? there is still the highest with a not NULL syncdate. So, duplicates arent an issue ?  Because if the previous was wrong, then the following is correct ?

      select 'TEST6',0,1,NULL union
      select 'TEST6',1,1,'20180717' union
      select 'TEST6',2,1,NULL

Is it really just a sort sequence ?  

Thanks for taking the time to explain. Appreciated.
That's a great question, but luckily it's not a scenario we would encounter due to business rules that are in place.

We basically have the concept of a "study", however users can create study "revisions". This happens when a customer wants to create multiple variations of the same study, so they can change any number of parameters and do comparisons. Effectively we create a clone of the study an increment the revision number by 1. So we can have (n) revisions of any particular study. At some point the customer will sign of and say "alright, we are happy with revision 4 (for example) of study "ABC" (even if there are revision numbers greater than 4 - they select which one they want) and it will be finalized.

However, there is still a window of time before we execute the study where they can still make really late changes (which we would charge extra for). These are called change orders. From this point on the revision number will no longer increment, but the change order number will (no "revisions" post finalization). The customer may sign off on it again, then decide they want another change, and thus another change order increment, but the revision number will stay the same.

That was a really verbose way of saying - once a study has been "finalized" then revision numbers will no longer increment and change order numbers will increment, therefor, no two instances of a particular study will have the same change order number once the change order number is greater than zero, so we would never see a record like your example.

I hope this makes sense!
Fantastic background and explanation. Makes total sense.

Revision numbers increment until such time that changeordernumbers start to increment. They are used for sequencing and so it is a sort order solution.

Thank you for persevering with this. It is very much appreciated.