[Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

get most recent and second most recent date in SQL Server

Posted on 2016-11-27
24
Medium Priority
?
237 Views
Last Modified: 2016-11-28
I'm trying to create an aggregate table using the combination of two others.

Feed Table A:

PrimaryKey, Score, DateField
1, 50, 1/1/2016
2, 12, 1/1/2016
1, 75, 1/12/2016
1, 80, 1/15/2016
2, 35, 1/16/2016

Open in new window


Expected Result (Table B):

PrimaryKey, Score, LastScore, OtherFields...
1, 80, 75, data...
2, 35, 12, data...

Open in new window


In record 1, 80 is the most recent record, 75 is the 2nd most recent record. For all records in Table B, this is the rule.

The condition is that there's over 25Million Records in Table A, and 250Million Records in Table B. Table B gets routinely updated by Table A.

I'm using statements like this:

      select a.RTPropertyUniqueIdentifier,
             max(case when seqnum = 1 then Scores end) as score,
             max(case when seqnum = 2 then Scores end) as score_previous
      from (select a.*,
                   row_number() over (partition by RTPropertyUniqueIdentifier order by LoadDate desc) as seqnum
            from PROPERTY_SCORE a
           ) a
      where seqnum <= 2  -- This should help performance of the `group by`
      group by a.RTPropertyUniqueIdentifier

Open in new window


My problem is that the response of this query response changes every time it runs.  I also tried to save the query as a view, but that resulted in the same random responses.

Thanks.
0
Comment
Question by:Evan Cutler
  • 10
  • 9
  • 2
  • +2
24 Comments
 
LVL 37

Expert Comment

by:Pawan Kumar
ID: 41903554
Try this

SELECT * FROM
(
    SELECT * , ROW_NUMBER() OVER (ORDER BY LoadDate DESC) rnk
    from PROPERTY_SCORE a
)p
where rnk < 3

Open in new window

0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41903558
Thanks, but that doesn't pivot the scores.  I need the last score, and the second to last score, in the same row.
Most appreciated.
0
 
LVL 37

Expert Comment

by:Pawan Kumar
ID: 41903559
Trial
CREATE TABLE aTest
(
	  PrimaryKey INT
	, Score INT
	, DateField DATE
)
GO


INSERT INTO aTest VALUES
(1, 50, '1/1/2016'),
(2, 12, '1/1/2016'),
(1, 75, '1/12/2016'),
(1, 80, '1/15/2016'),
(2, 35, '1/16/2016')
GO


SELECT * FROM
(
    SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
    from aTest a
)p
where rnk < 3

Open in new window


Output

PrimaryKey  Score       DateField  rnk
----------- ----------- ---------- --------------------
2           35          2016-01-16 1
1           80          2016-01-15 2

(2 row(s) affected)

Open in new window


Hope it helps !!
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 9

Author Comment

by:Evan Cutler
ID: 41903562
ok.  How do I update a single record with the latest score and the score before the latest score?
0
 
LVL 37

Expert Comment

by:Pawan Kumar
ID: 41903572
Try..

;WITH CTE AS
(
	SELECT * FROM(
	SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
	from aTest a)p WHERE rnk < 3
)
SELECT DISTINCT
		   (SELECT Score FROM CTE a WHERE rnk = 2 ) Highest
		 , (SELECT Score FROM CTE a WHERE rnk = 1 ) [2ndHighest] 
FROM CTE b

Open in new window


Output

Highest     2ndHighest
----------- -----------
80          35

(1 row(s) affected)

Open in new window


Hope it helps !
0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41903574
Thank you, but I'm missing the Primary Key: RTPropertyUniqueIdentifier.

I need the Key, Score, and PastScore so i can push it to another table.
0
 
LVL 37

Expert Comment

by:Pawan Kumar
ID: 41903579
;WITH CTE AS
(
	SELECT * FROM(
	SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
	from aTest a)p WHERE rnk < 3
)
SELECT DISTINCT
		   (SELECT PrimaryKey FROM CTE a WHERE rnk = 2 ) HighestScore
		   ,(SELECT Score FROM CTE a WHERE rnk = 2 ) HighestPrimarKey
		   ,(SELECT DateField FROM CTE a WHERE rnk = 2 ) HighestDate		 
		 ,(SELECT PrimaryKey FROM CTE a WHERE rnk = 1 ) [2ndHighestPrimaryKey]
		 , (SELECT Score FROM CTE a WHERE rnk = 1 ) [2ndHighestScore]
		   ,(SELECT DateField FROM CTE a WHERE rnk = 1 ) [2ndHighestDate] 
FROM CTE b

Open in new window



O/p

HighestScore HighestPrimarKey HighestDate 2ndHighestPrimaryKey 2ndHighestScore 2ndHighestDate
------------ ---------------- ----------- -------------------- --------------- --------------
1            80               2016-01-15  2                    35              2016-01-16

(1 row(s) affected)

Open in new window

0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41903593
so the problem with this one is that it gives me the latest score in the WHOLE table. I need the latest score and second latest score for each PrimaryKey?
0
 
LVL 37

Expert Comment

by:Pawan Kumar
ID: 41903596
Use partition by primary key in row_number
0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41903604
ok, I got this far,
but the query takes forever:
;WITH CTE AS
(
      SELECT * FROM(
      SELECT * , ROW_NUMBER() OVER (PARTITION bY RTPropertyUniqueIdentifier ORDER BY LoadDate DESC) rnk
      from PROPERTY_SCORE a)p WHERE rnk < 3
)
SELECT DISTINCT
                  RTPropertyUniqueIdentifier
             ,  (SELECT Scores FROM CTE a WHERE rnk = 2 ) Highest
             , (SELECT Scores FROM CTE a WHERE rnk = 1 ) [2ndHighest]
FROM CTE b

anyway to speed it up?  The table in question has 120Million records.
Thanks.
0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 41903607
With tables of this size the indexing and freshness of statistics will be major factors in performance.

Are any tables partitioned ?

Do you have an execution plan of this:
      select a.RTPropertyUniqueIdentifier,
             max(case when seqnum = 1 then Scores end) as score,
             max(case when seqnum = 2 then Scores end) as score_previous
      from (select a.*,
                   row_number() over (partition by RTPropertyUniqueIdentifier order by LoadDate desc) as seqnum
            from PROPERTY_SCORE a
           ) a
      where seqnum <= 2  -- This should help performance of the `group by`
      group by a.RTPropertyUniqueIdentifier

Open in new window

Can you attach it as a .sqlplan file please
0
 
LVL 37

Accepted Solution

by:
Pawan Kumar earned 2000 total points
ID: 41903609
ok, try

;WITH CTE AS
(
	SELECT * FROM 
	(
		SELECT DISTINCT PrimaryKey 
		FROM aTest
	)p
	CROSS APPLY
	(
		SELECT TOP 2 Score , DateField 
		FROM aTest b
		WHERE b.PrimaryKey = p.PrimaryKey
		ORDER BY b.DateField DESC
	)k
)
,CTE1 AS 
(
	SELECT *, CONCAT('Score',rnk) c1 , CONCAT('DateField',rnk) c2 FROM 
	(
		SELECT *, ROW_NUMBER() OVER (PARTITION BY PRIMARYKey Order BY DateField DESC) rnk
		FROM CTE 
	)k
)
SELECT PrimaryKey , MAX([Score1])[Score1],MAX([Score2])[Score2],MAX(DateField1)DateField1,MAX(DateField2)DateField2 FROM CTE1
PIVOT ( MAX(Score) FOR c1 IN ([Score1],[Score2]) )y1
PIVOT ( MAX(DateField) FOR c2 IN (DateField1,DateField2) )y1
GROUP BY PrimaryKey 

Open in new window


O/p

PrimaryKey  Score1      Score2      DateField1 DateField2
----------- ----------- ----------- ---------- ----------
1           80          75          2016-01-15 2016-01-12
2           35          12          2016-01-16 2016-01-01

(2 row(s) affected)

Open in new window


hope it helps !!
0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41903610
No.  The tables are not partitioned.  Being that the key is an integer, I'm not educated enough how to pull it off.
However, here's the SQLPLAN:  -plan.sqlplan

Thanks much.
0
 
LVL 43

Expert Comment

by:pcelba
ID: 41903971
Just a note:  The column named PrimaryKey does not represent Primary Key value defined as Unique row identification value.

You should also define how to process equal scores, e.g.
PrimaryKey, Score, DateField
1, 50, 1/1/2016
2, 12, 1/1/2016
1, 75, 1/12/2016
1, 80, 1/15/2016
1, 80, 1/14/2016
2, 35, 1/16/2016

Open in new window

What is the highest value and what is the second highest value in such case?
0
 
LVL 49

Expert Comment

by:PortletPaul
ID: 41904021
37% is spent scaning the table for data

plan-diagram.png
and another 54% ordering (for the row_number) and both of these are heavy in I/O

sorting.png
the index referenced is a non-clustered index [Property_By_Date_IDX] which appears to include [RTPropertyUniqueIdentifier] [LoadDate] and [Scores]

Unless you can find a way to stop scanning the whole table to arrive at your result I don't see anything simple to improve that query performance.

response of this query may change if the statistics for the table are not current
0
 
LVL 37

Expert Comment

by:Pawan Kumar
ID: 41904072
Hi Evan,
Please try my last comment.

Thanks
Pawan
0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41904198
You are right Pawan,

This is a log table of sorts.  Primary Keys do repeat, but they are primary keys.  When this statement is correct, I expect the output to be unique.  Testing your SQL now.  Thanks much.
0
 
LVL 43

Expert Comment

by:pcelba
ID: 41904238
No, Primary keys cannot repeat.  If the value of the key is repeating in some table then we are talking about Foreign keys which point to a different table where these values do not repeat and just in such table we can talk about Prinary key (which is unique by definition).

So your table does not have primary key which makes the records identification harder.
0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41904463
exactly, so Table B has the primary key, and is updated by Table A.
The query in question performs the action via Update on Select.
Unless you have a better idea.
0
 
LVL 70

Expert Comment

by:Scott Pletcher
ID: 41904616
You really need an index on PROPERTY_SCORE keyed by ( RTPropertyUniqueIdentifier, LoadDate ) for this query.  That will prevent a sort and make the aggregation vastly faster.

Given how large the table is, you should review the existing index stats to see if/how those indexes are being used.  You might be able to adjust this new index to replace an older index.  But, if not, you still need the new index.
0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41904657
another question.  should I create a materialized view of this query?
0
 
LVL 37

Expert Comment

by:Pawan Kumar
ID: 41905028
should I create a materialized view of this query?

Materialized view is like a prebaked data. Create a query and put index on it. They come with lot of limitations. So you have to decide. Also for this requirement I don't think you should create a View. You should go with the stored procedure.

Any luck with the query I gave?

Thank You,
Pawan
0
 
LVL 9

Author Closing Comment

by:Evan Cutler
ID: 41905045
Yes, I believe this query works.  It returned all 125M records in 16 minutes, which is a good time to run with.  Thanks much.
0
 
LVL 37

Expert Comment

by:Pawan Kumar
ID: 41905048
Welcome Evan !!
0

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article, we’ll look at how to deploy ProxySQL.
Instead of error trapping or hard-coding for non-updateable fields when using QODBC, let VBA automatically disable them when forms open. This way, users can view but not change the data. Part 1 explained how to use schema tables to do this. Part 2 h…
Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.
Viewers will learn how to use the SELECT statement in SQL and will be exposed to the many uses the SELECT statement has.
Suggested Courses

865 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question