Link to home
Start Free TrialLog in
Avatar of Evan Cutler
Evan CutlerFlag for United States of America

asked on

get most recent and second most recent date in SQL Server

I'm trying to create an aggregate table using the combination of two others.

Feed Table A:

PrimaryKey, Score, DateField
1, 50, 1/1/2016
2, 12, 1/1/2016
1, 75, 1/12/2016
1, 80, 1/15/2016
2, 35, 1/16/2016

Open in new window


Expected Result (Table B):

PrimaryKey, Score, LastScore, OtherFields...
1, 80, 75, data...
2, 35, 12, data...

Open in new window


In record 1, 80 is the most recent record, 75 is the 2nd most recent record. For all records in Table B, this is the rule.

The condition is that there's over 25Million Records in Table A, and 250Million Records in Table B. Table B gets routinely updated by Table A.

I'm using statements like this:

      select a.RTPropertyUniqueIdentifier,
             max(case when seqnum = 1 then Scores end) as score,
             max(case when seqnum = 2 then Scores end) as score_previous
      from (select a.*,
                   row_number() over (partition by RTPropertyUniqueIdentifier order by LoadDate desc) as seqnum
            from PROPERTY_SCORE a
           ) a
      where seqnum <= 2  -- This should help performance of the `group by`
      group by a.RTPropertyUniqueIdentifier

Open in new window


My problem is that the response of this query response changes every time it runs.  I also tried to save the query as a view, but that resulted in the same random responses.

Thanks.
Avatar of Pawan Kumar
Pawan Kumar
Flag of India image

Try this

SELECT * FROM
(
    SELECT * , ROW_NUMBER() OVER (ORDER BY LoadDate DESC) rnk
    from PROPERTY_SCORE a
)p
where rnk < 3

Open in new window

Avatar of Evan Cutler

ASKER

Thanks, but that doesn't pivot the scores.  I need the last score, and the second to last score, in the same row.
Most appreciated.
Trial
CREATE TABLE aTest
(
	  PrimaryKey INT
	, Score INT
	, DateField DATE
)
GO


INSERT INTO aTest VALUES
(1, 50, '1/1/2016'),
(2, 12, '1/1/2016'),
(1, 75, '1/12/2016'),
(1, 80, '1/15/2016'),
(2, 35, '1/16/2016')
GO


SELECT * FROM
(
    SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
    from aTest a
)p
where rnk < 3

Open in new window


Output

PrimaryKey  Score       DateField  rnk
----------- ----------- ---------- --------------------
2           35          2016-01-16 1
1           80          2016-01-15 2

(2 row(s) affected)

Open in new window


Hope it helps !!
ok.  How do I update a single record with the latest score and the score before the latest score?
Try..

;WITH CTE AS
(
	SELECT * FROM(
	SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
	from aTest a)p WHERE rnk < 3
)
SELECT DISTINCT
		   (SELECT Score FROM CTE a WHERE rnk = 2 ) Highest
		 , (SELECT Score FROM CTE a WHERE rnk = 1 ) [2ndHighest] 
FROM CTE b

Open in new window


Output

Highest     2ndHighest
----------- -----------
80          35

(1 row(s) affected)

Open in new window


Hope it helps !
Thank you, but I'm missing the Primary Key: RTPropertyUniqueIdentifier.

I need the Key, Score, and PastScore so i can push it to another table.
;WITH CTE AS
(
	SELECT * FROM(
	SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
	from aTest a)p WHERE rnk < 3
)
SELECT DISTINCT
		   (SELECT PrimaryKey FROM CTE a WHERE rnk = 2 ) HighestScore
		   ,(SELECT Score FROM CTE a WHERE rnk = 2 ) HighestPrimarKey
		   ,(SELECT DateField FROM CTE a WHERE rnk = 2 ) HighestDate		 
		 ,(SELECT PrimaryKey FROM CTE a WHERE rnk = 1 ) [2ndHighestPrimaryKey]
		 , (SELECT Score FROM CTE a WHERE rnk = 1 ) [2ndHighestScore]
		   ,(SELECT DateField FROM CTE a WHERE rnk = 1 ) [2ndHighestDate] 
FROM CTE b

Open in new window



O/p

HighestScore HighestPrimarKey HighestDate 2ndHighestPrimaryKey 2ndHighestScore 2ndHighestDate
------------ ---------------- ----------- -------------------- --------------- --------------
1            80               2016-01-15  2                    35              2016-01-16

(1 row(s) affected)

Open in new window

so the problem with this one is that it gives me the latest score in the WHOLE table. I need the latest score and second latest score for each PrimaryKey?
Use partition by primary key in row_number
ok, I got this far,
but the query takes forever:
;WITH CTE AS
(
      SELECT * FROM(
      SELECT * , ROW_NUMBER() OVER (PARTITION bY RTPropertyUniqueIdentifier ORDER BY LoadDate DESC) rnk
      from PROPERTY_SCORE a)p WHERE rnk < 3
)
SELECT DISTINCT
                  RTPropertyUniqueIdentifier
             ,  (SELECT Scores FROM CTE a WHERE rnk = 2 ) Highest
             , (SELECT Scores FROM CTE a WHERE rnk = 1 ) [2ndHighest]
FROM CTE b

anyway to speed it up?  The table in question has 120Million records.
Thanks.
Avatar of PortletPaul
With tables of this size the indexing and freshness of statistics will be major factors in performance.

Are any tables partitioned ?

Do you have an execution plan of this:
      select a.RTPropertyUniqueIdentifier,
             max(case when seqnum = 1 then Scores end) as score,
             max(case when seqnum = 2 then Scores end) as score_previous
      from (select a.*,
                   row_number() over (partition by RTPropertyUniqueIdentifier order by LoadDate desc) as seqnum
            from PROPERTY_SCORE a
           ) a
      where seqnum <= 2  -- This should help performance of the `group by`
      group by a.RTPropertyUniqueIdentifier

Open in new window

Can you attach it as a .sqlplan file please
ASKER CERTIFIED SOLUTION
Avatar of Pawan Kumar
Pawan Kumar
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
No.  The tables are not partitioned.  Being that the key is an integer, I'm not educated enough how to pull it off.
However, here's the SQLPLAN:  -plan.sqlplan

Thanks much.
Just a note:  The column named PrimaryKey does not represent Primary Key value defined as Unique row identification value.

You should also define how to process equal scores, e.g.
PrimaryKey, Score, DateField
1, 50, 1/1/2016
2, 12, 1/1/2016
1, 75, 1/12/2016
1, 80, 1/15/2016
1, 80, 1/14/2016
2, 35, 1/16/2016

Open in new window

What is the highest value and what is the second highest value in such case?
37% is spent scaning the table for data

User generated image
and another 54% ordering (for the row_number) and both of these are heavy in I/O

User generated image
the index referenced is a non-clustered index [Property_By_Date_IDX] which appears to include [RTPropertyUniqueIdentifier] [LoadDate] and [Scores]

Unless you can find a way to stop scanning the whole table to arrive at your result I don't see anything simple to improve that query performance.

response of this query may change if the statistics for the table are not current
Hi Evan,
Please try my last comment.

Thanks
Pawan
You are right Pawan,

This is a log table of sorts.  Primary Keys do repeat, but they are primary keys.  When this statement is correct, I expect the output to be unique.  Testing your SQL now.  Thanks much.
No, Primary keys cannot repeat.  If the value of the key is repeating in some table then we are talking about Foreign keys which point to a different table where these values do not repeat and just in such table we can talk about Prinary key (which is unique by definition).

So your table does not have primary key which makes the records identification harder.
exactly, so Table B has the primary key, and is updated by Table A.
The query in question performs the action via Update on Select.
Unless you have a better idea.
You really need an index on PROPERTY_SCORE keyed by ( RTPropertyUniqueIdentifier, LoadDate ) for this query.  That will prevent a sort and make the aggregation vastly faster.

Given how large the table is, you should review the existing index stats to see if/how those indexes are being used.  You might be able to adjust this new index to replace an older index.  But, if not, you still need the new index.
another question.  should I create a materialized view of this query?
should I create a materialized view of this query?

Materialized view is like a prebaked data. Create a query and put index on it. They come with lot of limitations. So you have to decide. Also for this requirement I don't think you should create a View. You should go with the stored procedure.

Any luck with the query I gave?

Thank You,
Pawan
Yes, I believe this query works.  It returned all 125M records in 16 minutes, which is a good time to run with.  Thanks much.
Welcome Evan !!