Evan Cutler
asked on
get most recent and second most recent date in SQL Server
I'm trying to create an aggregate table using the combination of two others.
Feed Table A:
Expected Result (Table B):
In record 1, 80 is the most recent record, 75 is the 2nd most recent record. For all records in Table B, this is the rule.
The condition is that there's over 25Million Records in Table A, and 250Million Records in Table B. Table B gets routinely updated by Table A.
I'm using statements like this:
My problem is that the response of this query response changes every time it runs. I also tried to save the query as a view, but that resulted in the same random responses.
Thanks.
Feed Table A:
PrimaryKey, Score, DateField
1, 50, 1/1/2016
2, 12, 1/1/2016
1, 75, 1/12/2016
1, 80, 1/15/2016
2, 35, 1/16/2016
Expected Result (Table B):
PrimaryKey, Score, LastScore, OtherFields...
1, 80, 75, data...
2, 35, 12, data...
In record 1, 80 is the most recent record, 75 is the 2nd most recent record. For all records in Table B, this is the rule.
The condition is that there's over 25Million Records in Table A, and 250Million Records in Table B. Table B gets routinely updated by Table A.
I'm using statements like this:
select a.RTPropertyUniqueIdentifier,
max(case when seqnum = 1 then Scores end) as score,
max(case when seqnum = 2 then Scores end) as score_previous
from (select a.*,
row_number() over (partition by RTPropertyUniqueIdentifier order by LoadDate desc) as seqnum
from PROPERTY_SCORE a
) a
where seqnum <= 2 -- This should help performance of the `group by`
group by a.RTPropertyUniqueIdentifier
My problem is that the response of this query response changes every time it runs. I also tried to save the query as a view, but that resulted in the same random responses.
Thanks.
ASKER
Thanks, but that doesn't pivot the scores. I need the last score, and the second to last score, in the same row.
Most appreciated.
Most appreciated.
Trial
Output
Hope it helps !!
CREATE TABLE aTest
(
PrimaryKey INT
, Score INT
, DateField DATE
)
GO
INSERT INTO aTest VALUES
(1, 50, '1/1/2016'),
(2, 12, '1/1/2016'),
(1, 75, '1/12/2016'),
(1, 80, '1/15/2016'),
(2, 35, '1/16/2016')
GO
SELECT * FROM
(
SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
from aTest a
)p
where rnk < 3
Output
PrimaryKey Score DateField rnk
----------- ----------- ---------- --------------------
2 35 2016-01-16 1
1 80 2016-01-15 2
(2 row(s) affected)
Hope it helps !!
ASKER
ok. How do I update a single record with the latest score and the score before the latest score?
Try..
Output
Hope it helps !
;WITH CTE AS
(
SELECT * FROM(
SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
from aTest a)p WHERE rnk < 3
)
SELECT DISTINCT
(SELECT Score FROM CTE a WHERE rnk = 2 ) Highest
, (SELECT Score FROM CTE a WHERE rnk = 1 ) [2ndHighest]
FROM CTE b
Output
Highest 2ndHighest
----------- -----------
80 35
(1 row(s) affected)
Hope it helps !
ASKER
Thank you, but I'm missing the Primary Key: RTPropertyUniqueIdentifier .
I need the Key, Score, and PastScore so i can push it to another table.
I need the Key, Score, and PastScore so i can push it to another table.
;WITH CTE AS
(
SELECT * FROM(
SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
from aTest a)p WHERE rnk < 3
)
SELECT DISTINCT
(SELECT PrimaryKey FROM CTE a WHERE rnk = 2 ) HighestScore
,(SELECT Score FROM CTE a WHERE rnk = 2 ) HighestPrimarKey
,(SELECT DateField FROM CTE a WHERE rnk = 2 ) HighestDate
,(SELECT PrimaryKey FROM CTE a WHERE rnk = 1 ) [2ndHighestPrimaryKey]
, (SELECT Score FROM CTE a WHERE rnk = 1 ) [2ndHighestScore]
,(SELECT DateField FROM CTE a WHERE rnk = 1 ) [2ndHighestDate]
FROM CTE b
O/p
HighestScore HighestPrimarKey HighestDate 2ndHighestPrimaryKey 2ndHighestScore 2ndHighestDate
------------ ---------------- ----------- -------------------- --------------- --------------
1 80 2016-01-15 2 35 2016-01-16
(1 row(s) affected)
ASKER
so the problem with this one is that it gives me the latest score in the WHOLE table. I need the latest score and second latest score for each PrimaryKey?
Use partition by primary key in row_number
ASKER
ok, I got this far,
but the query takes forever:
;WITH CTE AS
(
SELECT * FROM(
SELECT * , ROW_NUMBER() OVER (PARTITION bY RTPropertyUniqueIdentifier ORDER BY LoadDate DESC) rnk
from PROPERTY_SCORE a)p WHERE rnk < 3
)
SELECT DISTINCT
RTPropertyUniqueIdentifier
, (SELECT Scores FROM CTE a WHERE rnk = 2 ) Highest
, (SELECT Scores FROM CTE a WHERE rnk = 1 ) [2ndHighest]
FROM CTE b
anyway to speed it up? The table in question has 120Million records.
Thanks.
but the query takes forever:
;WITH CTE AS
(
SELECT * FROM(
SELECT * , ROW_NUMBER() OVER (PARTITION bY RTPropertyUniqueIdentifier
from PROPERTY_SCORE a)p WHERE rnk < 3
)
SELECT DISTINCT
RTPropertyUniqueIdentifier
, (SELECT Scores FROM CTE a WHERE rnk = 2 ) Highest
, (SELECT Scores FROM CTE a WHERE rnk = 1 ) [2ndHighest]
FROM CTE b
anyway to speed it up? The table in question has 120Million records.
Thanks.
With tables of this size the indexing and freshness of statistics will be major factors in performance.
Are any tables partitioned ?
Do you have an execution plan of this:
Are any tables partitioned ?
Do you have an execution plan of this:
select a.RTPropertyUniqueIdentifier,
max(case when seqnum = 1 then Scores end) as score,
max(case when seqnum = 2 then Scores end) as score_previous
from (select a.*,
row_number() over (partition by RTPropertyUniqueIdentifier order by LoadDate desc) as seqnum
from PROPERTY_SCORE a
) a
where seqnum <= 2 -- This should help performance of the `group by`
group by a.RTPropertyUniqueIdentifier
Can you attach it as a .sqlplan file please
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
No. The tables are not partitioned. Being that the key is an integer, I'm not educated enough how to pull it off.
However, here's the SQLPLAN: -plan.sqlplan
Thanks much.
However, here's the SQLPLAN: -plan.sqlplan
Thanks much.
Just a note: The column named PrimaryKey does not represent Primary Key value defined as Unique row identification value.
You should also define how to process equal scores, e.g.
You should also define how to process equal scores, e.g.
PrimaryKey, Score, DateField
1, 50, 1/1/2016
2, 12, 1/1/2016
1, 75, 1/12/2016
1, 80, 1/15/2016
1, 80, 1/14/2016
2, 35, 1/16/2016
What is the highest value and what is the second highest value in such case?
37% is spent scaning the table for data
and another 54% ordering (for the row_number) and both of these are heavy in I/O
the index referenced is a non-clustered index [Property_By_Date_IDX] which appears to include [RTPropertyUniqueIdentifie r] [LoadDate] and [Scores]
Unless you can find a way to stop scanning the whole table to arrive at your result I don't see anything simple to improve that query performance.
response of this query may change if the statistics for the table are not current
and another 54% ordering (for the row_number) and both of these are heavy in I/O
the index referenced is a non-clustered index [Property_By_Date_IDX] which appears to include [RTPropertyUniqueIdentifie
Unless you can find a way to stop scanning the whole table to arrive at your result I don't see anything simple to improve that query performance.
response of this query may change if the statistics for the table are not current
Hi Evan,
Please try my last comment.
Thanks
Pawan
Please try my last comment.
Thanks
Pawan
ASKER
You are right Pawan,
This is a log table of sorts. Primary Keys do repeat, but they are primary keys. When this statement is correct, I expect the output to be unique. Testing your SQL now. Thanks much.
This is a log table of sorts. Primary Keys do repeat, but they are primary keys. When this statement is correct, I expect the output to be unique. Testing your SQL now. Thanks much.
No, Primary keys cannot repeat. If the value of the key is repeating in some table then we are talking about Foreign keys which point to a different table where these values do not repeat and just in such table we can talk about Prinary key (which is unique by definition).
So your table does not have primary key which makes the records identification harder.
So your table does not have primary key which makes the records identification harder.
ASKER
exactly, so Table B has the primary key, and is updated by Table A.
The query in question performs the action via Update on Select.
Unless you have a better idea.
The query in question performs the action via Update on Select.
Unless you have a better idea.
You really need an index on PROPERTY_SCORE keyed by ( RTPropertyUniqueIdentifier , LoadDate ) for this query. That will prevent a sort and make the aggregation vastly faster.
Given how large the table is, you should review the existing index stats to see if/how those indexes are being used. You might be able to adjust this new index to replace an older index. But, if not, you still need the new index.
Given how large the table is, you should review the existing index stats to see if/how those indexes are being used. You might be able to adjust this new index to replace an older index. But, if not, you still need the new index.
ASKER
another question. should I create a materialized view of this query?
should I create a materialized view of this query?
Materialized view is like a prebaked data. Create a query and put index on it. They come with lot of limitations. So you have to decide. Also for this requirement I don't think you should create a View. You should go with the stored procedure.
Any luck with the query I gave?
Thank You,
Pawan
ASKER
Yes, I believe this query works. It returned all 125M records in 16 minutes, which is a good time to run with. Thanks much.
Welcome Evan !!
Open in new window