get most recent and second most recent date in SQL Server

I'm trying to create an aggregate table using the combination of two others.

Feed Table A:

PrimaryKey, Score, DateField
1, 50, 1/1/2016
2, 12, 1/1/2016
1, 75, 1/12/2016
1, 80, 1/15/2016
2, 35, 1/16/2016

Open in new window


Expected Result (Table B):

PrimaryKey, Score, LastScore, OtherFields...
1, 80, 75, data...
2, 35, 12, data...

Open in new window


In record 1, 80 is the most recent record, 75 is the 2nd most recent record. For all records in Table B, this is the rule.

The condition is that there's over 25Million Records in Table A, and 250Million Records in Table B. Table B gets routinely updated by Table A.

I'm using statements like this:

      select a.RTPropertyUniqueIdentifier,
             max(case when seqnum = 1 then Scores end) as score,
             max(case when seqnum = 2 then Scores end) as score_previous
      from (select a.*,
                   row_number() over (partition by RTPropertyUniqueIdentifier order by LoadDate desc) as seqnum
            from PROPERTY_SCORE a
           ) a
      where seqnum <= 2  -- This should help performance of the `group by`
      group by a.RTPropertyUniqueIdentifier

Open in new window


My problem is that the response of this query response changes every time it runs.  I also tried to save the query as a view, but that resulted in the same random responses.

Thanks.
LVL 9
Evan CutlerVolunteer Chief Information OfficerAsked:
Who is Participating?
 
Pawan KumarConnect With a Mentor Database ExpertCommented:
ok, try

;WITH CTE AS
(
	SELECT * FROM 
	(
		SELECT DISTINCT PrimaryKey 
		FROM aTest
	)p
	CROSS APPLY
	(
		SELECT TOP 2 Score , DateField 
		FROM aTest b
		WHERE b.PrimaryKey = p.PrimaryKey
		ORDER BY b.DateField DESC
	)k
)
,CTE1 AS 
(
	SELECT *, CONCAT('Score',rnk) c1 , CONCAT('DateField',rnk) c2 FROM 
	(
		SELECT *, ROW_NUMBER() OVER (PARTITION BY PRIMARYKey Order BY DateField DESC) rnk
		FROM CTE 
	)k
)
SELECT PrimaryKey , MAX([Score1])[Score1],MAX([Score2])[Score2],MAX(DateField1)DateField1,MAX(DateField2)DateField2 FROM CTE1
PIVOT ( MAX(Score) FOR c1 IN ([Score1],[Score2]) )y1
PIVOT ( MAX(DateField) FOR c2 IN (DateField1,DateField2) )y1
GROUP BY PrimaryKey 

Open in new window


O/p

PrimaryKey  Score1      Score2      DateField1 DateField2
----------- ----------- ----------- ---------- ----------
1           80          75          2016-01-15 2016-01-12
2           35          12          2016-01-16 2016-01-01

(2 row(s) affected)

Open in new window


hope it helps !!
0
 
Pawan KumarDatabase ExpertCommented:
Try this

SELECT * FROM
(
    SELECT * , ROW_NUMBER() OVER (ORDER BY LoadDate DESC) rnk
    from PROPERTY_SCORE a
)p
where rnk < 3

Open in new window

0
 
Evan CutlerVolunteer Chief Information OfficerAuthor Commented:
Thanks, but that doesn't pivot the scores.  I need the last score, and the second to last score, in the same row.
Most appreciated.
0
The new generation of project management tools

With monday.com’s project management tool, you can see what everyone on your team is working in a single glance. Its intuitive dashboards are customizable, so you can create systems that work for you.

 
Pawan KumarDatabase ExpertCommented:
Trial
CREATE TABLE aTest
(
	  PrimaryKey INT
	, Score INT
	, DateField DATE
)
GO


INSERT INTO aTest VALUES
(1, 50, '1/1/2016'),
(2, 12, '1/1/2016'),
(1, 75, '1/12/2016'),
(1, 80, '1/15/2016'),
(2, 35, '1/16/2016')
GO


SELECT * FROM
(
    SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
    from aTest a
)p
where rnk < 3

Open in new window


Output

PrimaryKey  Score       DateField  rnk
----------- ----------- ---------- --------------------
2           35          2016-01-16 1
1           80          2016-01-15 2

(2 row(s) affected)

Open in new window


Hope it helps !!
0
 
Evan CutlerVolunteer Chief Information OfficerAuthor Commented:
ok.  How do I update a single record with the latest score and the score before the latest score?
0
 
Pawan KumarDatabase ExpertCommented:
Try..

;WITH CTE AS
(
	SELECT * FROM(
	SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
	from aTest a)p WHERE rnk < 3
)
SELECT DISTINCT
		   (SELECT Score FROM CTE a WHERE rnk = 2 ) Highest
		 , (SELECT Score FROM CTE a WHERE rnk = 1 ) [2ndHighest] 
FROM CTE b

Open in new window


Output

Highest     2ndHighest
----------- -----------
80          35

(1 row(s) affected)

Open in new window


Hope it helps !
0
 
Evan CutlerVolunteer Chief Information OfficerAuthor Commented:
Thank you, but I'm missing the Primary Key: RTPropertyUniqueIdentifier.

I need the Key, Score, and PastScore so i can push it to another table.
0
 
Pawan KumarDatabase ExpertCommented:
;WITH CTE AS
(
	SELECT * FROM(
	SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
	from aTest a)p WHERE rnk < 3
)
SELECT DISTINCT
		   (SELECT PrimaryKey FROM CTE a WHERE rnk = 2 ) HighestScore
		   ,(SELECT Score FROM CTE a WHERE rnk = 2 ) HighestPrimarKey
		   ,(SELECT DateField FROM CTE a WHERE rnk = 2 ) HighestDate		 
		 ,(SELECT PrimaryKey FROM CTE a WHERE rnk = 1 ) [2ndHighestPrimaryKey]
		 , (SELECT Score FROM CTE a WHERE rnk = 1 ) [2ndHighestScore]
		   ,(SELECT DateField FROM CTE a WHERE rnk = 1 ) [2ndHighestDate] 
FROM CTE b

Open in new window



O/p

HighestScore HighestPrimarKey HighestDate 2ndHighestPrimaryKey 2ndHighestScore 2ndHighestDate
------------ ---------------- ----------- -------------------- --------------- --------------
1            80               2016-01-15  2                    35              2016-01-16

(1 row(s) affected)

Open in new window

0
 
Evan CutlerVolunteer Chief Information OfficerAuthor Commented:
so the problem with this one is that it gives me the latest score in the WHOLE table. I need the latest score and second latest score for each PrimaryKey?
0
 
Pawan KumarDatabase ExpertCommented:
Use partition by primary key in row_number
0
 
Evan CutlerVolunteer Chief Information OfficerAuthor Commented:
ok, I got this far,
but the query takes forever:
;WITH CTE AS
(
      SELECT * FROM(
      SELECT * , ROW_NUMBER() OVER (PARTITION bY RTPropertyUniqueIdentifier ORDER BY LoadDate DESC) rnk
      from PROPERTY_SCORE a)p WHERE rnk < 3
)
SELECT DISTINCT
                  RTPropertyUniqueIdentifier
             ,  (SELECT Scores FROM CTE a WHERE rnk = 2 ) Highest
             , (SELECT Scores FROM CTE a WHERE rnk = 1 ) [2ndHighest]
FROM CTE b

anyway to speed it up?  The table in question has 120Million records.
Thanks.
0
 
PortletPaulCommented:
With tables of this size the indexing and freshness of statistics will be major factors in performance.

Are any tables partitioned ?

Do you have an execution plan of this:
      select a.RTPropertyUniqueIdentifier,
             max(case when seqnum = 1 then Scores end) as score,
             max(case when seqnum = 2 then Scores end) as score_previous
      from (select a.*,
                   row_number() over (partition by RTPropertyUniqueIdentifier order by LoadDate desc) as seqnum
            from PROPERTY_SCORE a
           ) a
      where seqnum <= 2  -- This should help performance of the `group by`
      group by a.RTPropertyUniqueIdentifier

Open in new window

Can you attach it as a .sqlplan file please
0
 
Evan CutlerVolunteer Chief Information OfficerAuthor Commented:
No.  The tables are not partitioned.  Being that the key is an integer, I'm not educated enough how to pull it off.
However, here's the SQLPLAN:  -plan.sqlplan

Thanks much.
0
 
pcelbaCommented:
Just a note:  The column named PrimaryKey does not represent Primary Key value defined as Unique row identification value.

You should also define how to process equal scores, e.g.
PrimaryKey, Score, DateField
1, 50, 1/1/2016
2, 12, 1/1/2016
1, 75, 1/12/2016
1, 80, 1/15/2016
1, 80, 1/14/2016
2, 35, 1/16/2016

Open in new window

What is the highest value and what is the second highest value in such case?
0
 
PortletPaulCommented:
37% is spent scaning the table for data

plan-diagram.png
and another 54% ordering (for the row_number) and both of these are heavy in I/O

sorting.png
the index referenced is a non-clustered index [Property_By_Date_IDX] which appears to include [RTPropertyUniqueIdentifier] [LoadDate] and [Scores]

Unless you can find a way to stop scanning the whole table to arrive at your result I don't see anything simple to improve that query performance.

response of this query may change if the statistics for the table are not current
0
 
Pawan KumarDatabase ExpertCommented:
Hi Evan,
Please try my last comment.

Thanks
Pawan
0
 
Evan CutlerVolunteer Chief Information OfficerAuthor Commented:
You are right Pawan,

This is a log table of sorts.  Primary Keys do repeat, but they are primary keys.  When this statement is correct, I expect the output to be unique.  Testing your SQL now.  Thanks much.
0
 
pcelbaCommented:
No, Primary keys cannot repeat.  If the value of the key is repeating in some table then we are talking about Foreign keys which point to a different table where these values do not repeat and just in such table we can talk about Prinary key (which is unique by definition).

So your table does not have primary key which makes the records identification harder.
0
 
Evan CutlerVolunteer Chief Information OfficerAuthor Commented:
exactly, so Table B has the primary key, and is updated by Table A.
The query in question performs the action via Update on Select.
Unless you have a better idea.
0
 
Scott PletcherSenior DBACommented:
You really need an index on PROPERTY_SCORE keyed by ( RTPropertyUniqueIdentifier, LoadDate ) for this query.  That will prevent a sort and make the aggregation vastly faster.

Given how large the table is, you should review the existing index stats to see if/how those indexes are being used.  You might be able to adjust this new index to replace an older index.  But, if not, you still need the new index.
0
 
Evan CutlerVolunteer Chief Information OfficerAuthor Commented:
another question.  should I create a materialized view of this query?
0
 
Pawan KumarDatabase ExpertCommented:
should I create a materialized view of this query?

Materialized view is like a prebaked data. Create a query and put index on it. They come with lot of limitations. So you have to decide. Also for this requirement I don't think you should create a View. You should go with the stored procedure.

Any luck with the query I gave?

Thank You,
Pawan
0
 
Evan CutlerVolunteer Chief Information OfficerAuthor Commented:
Yes, I believe this query works.  It returned all 125M records in 16 minutes, which is a good time to run with.  Thanks much.
0
 
Pawan KumarDatabase ExpertCommented:
Welcome Evan !!
0
All Courses

From novice to tech pro — start learning today.