Solved

get most recent and second most recent date in SQL Server

Posted on 2016-11-27
24
38 Views
Last Modified: 2016-11-28
I'm trying to create an aggregate table using the combination of two others.

Feed Table A:

PrimaryKey, Score, DateField
1, 50, 1/1/2016
2, 12, 1/1/2016
1, 75, 1/12/2016
1, 80, 1/15/2016
2, 35, 1/16/2016

Open in new window


Expected Result (Table B):

PrimaryKey, Score, LastScore, OtherFields...
1, 80, 75, data...
2, 35, 12, data...

Open in new window


In record 1, 80 is the most recent record, 75 is the 2nd most recent record. For all records in Table B, this is the rule.

The condition is that there's over 25Million Records in Table A, and 250Million Records in Table B. Table B gets routinely updated by Table A.

I'm using statements like this:

      select a.RTPropertyUniqueIdentifier,
             max(case when seqnum = 1 then Scores end) as score,
             max(case when seqnum = 2 then Scores end) as score_previous
      from (select a.*,
                   row_number() over (partition by RTPropertyUniqueIdentifier order by LoadDate desc) as seqnum
            from PROPERTY_SCORE a
           ) a
      where seqnum <= 2  -- This should help performance of the `group by`
      group by a.RTPropertyUniqueIdentifier

Open in new window


My problem is that the response of this query response changes every time it runs.  I also tried to save the query as a view, but that resulted in the same random responses.

Thanks.
0
Comment
Question by:Evan Cutler
  • 10
  • 9
  • 2
  • +2
24 Comments
 
LVL 17

Expert Comment

by:Pawan Kumar Khowal
ID: 41903554
Try this

SELECT * FROM
(
    SELECT * , ROW_NUMBER() OVER (ORDER BY LoadDate DESC) rnk
    from PROPERTY_SCORE a
)p
where rnk < 3

Open in new window

0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41903558
Thanks, but that doesn't pivot the scores.  I need the last score, and the second to last score, in the same row.
Most appreciated.
0
 
LVL 17

Expert Comment

by:Pawan Kumar Khowal
ID: 41903559
Trial
CREATE TABLE aTest
(
	  PrimaryKey INT
	, Score INT
	, DateField DATE
)
GO


INSERT INTO aTest VALUES
(1, 50, '1/1/2016'),
(2, 12, '1/1/2016'),
(1, 75, '1/12/2016'),
(1, 80, '1/15/2016'),
(2, 35, '1/16/2016')
GO


SELECT * FROM
(
    SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
    from aTest a
)p
where rnk < 3

Open in new window


Output

PrimaryKey  Score       DateField  rnk
----------- ----------- ---------- --------------------
2           35          2016-01-16 1
1           80          2016-01-15 2

(2 row(s) affected)

Open in new window


Hope it helps !!
0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41903562
ok.  How do I update a single record with the latest score and the score before the latest score?
0
 
LVL 17

Expert Comment

by:Pawan Kumar Khowal
ID: 41903572
Try..

;WITH CTE AS
(
	SELECT * FROM(
	SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
	from aTest a)p WHERE rnk < 3
)
SELECT DISTINCT
		   (SELECT Score FROM CTE a WHERE rnk = 2 ) Highest
		 , (SELECT Score FROM CTE a WHERE rnk = 1 ) [2ndHighest] 
FROM CTE b

Open in new window


Output

Highest     2ndHighest
----------- -----------
80          35

(1 row(s) affected)

Open in new window


Hope it helps !
0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41903574
Thank you, but I'm missing the Primary Key: RTPropertyUniqueIdentifier.

I need the Key, Score, and PastScore so i can push it to another table.
0
 
LVL 17

Expert Comment

by:Pawan Kumar Khowal
ID: 41903579
;WITH CTE AS
(
	SELECT * FROM(
	SELECT * , ROW_NUMBER() OVER (ORDER BY DateField DESC) rnk
	from aTest a)p WHERE rnk < 3
)
SELECT DISTINCT
		   (SELECT PrimaryKey FROM CTE a WHERE rnk = 2 ) HighestScore
		   ,(SELECT Score FROM CTE a WHERE rnk = 2 ) HighestPrimarKey
		   ,(SELECT DateField FROM CTE a WHERE rnk = 2 ) HighestDate		 
		 ,(SELECT PrimaryKey FROM CTE a WHERE rnk = 1 ) [2ndHighestPrimaryKey]
		 , (SELECT Score FROM CTE a WHERE rnk = 1 ) [2ndHighestScore]
		   ,(SELECT DateField FROM CTE a WHERE rnk = 1 ) [2ndHighestDate] 
FROM CTE b

Open in new window



O/p

HighestScore HighestPrimarKey HighestDate 2ndHighestPrimaryKey 2ndHighestScore 2ndHighestDate
------------ ---------------- ----------- -------------------- --------------- --------------
1            80               2016-01-15  2                    35              2016-01-16

(1 row(s) affected)

Open in new window

0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41903593
so the problem with this one is that it gives me the latest score in the WHOLE table. I need the latest score and second latest score for each PrimaryKey?
0
 
LVL 17

Expert Comment

by:Pawan Kumar Khowal
ID: 41903596
Use partition by primary key in row_number
0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41903604
ok, I got this far,
but the query takes forever:
;WITH CTE AS
(
      SELECT * FROM(
      SELECT * , ROW_NUMBER() OVER (PARTITION bY RTPropertyUniqueIdentifier ORDER BY LoadDate DESC) rnk
      from PROPERTY_SCORE a)p WHERE rnk < 3
)
SELECT DISTINCT
                  RTPropertyUniqueIdentifier
             ,  (SELECT Scores FROM CTE a WHERE rnk = 2 ) Highest
             , (SELECT Scores FROM CTE a WHERE rnk = 1 ) [2ndHighest]
FROM CTE b

anyway to speed it up?  The table in question has 120Million records.
Thanks.
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 41903607
With tables of this size the indexing and freshness of statistics will be major factors in performance.

Are any tables partitioned ?

Do you have an execution plan of this:
      select a.RTPropertyUniqueIdentifier,
             max(case when seqnum = 1 then Scores end) as score,
             max(case when seqnum = 2 then Scores end) as score_previous
      from (select a.*,
                   row_number() over (partition by RTPropertyUniqueIdentifier order by LoadDate desc) as seqnum
            from PROPERTY_SCORE a
           ) a
      where seqnum <= 2  -- This should help performance of the `group by`
      group by a.RTPropertyUniqueIdentifier

Open in new window

Can you attach it as a .sqlplan file please
0
 
LVL 17

Accepted Solution

by:
Pawan Kumar Khowal earned 500 total points
ID: 41903609
ok, try

;WITH CTE AS
(
	SELECT * FROM 
	(
		SELECT DISTINCT PrimaryKey 
		FROM aTest
	)p
	CROSS APPLY
	(
		SELECT TOP 2 Score , DateField 
		FROM aTest b
		WHERE b.PrimaryKey = p.PrimaryKey
		ORDER BY b.DateField DESC
	)k
)
,CTE1 AS 
(
	SELECT *, CONCAT('Score',rnk) c1 , CONCAT('DateField',rnk) c2 FROM 
	(
		SELECT *, ROW_NUMBER() OVER (PARTITION BY PRIMARYKey Order BY DateField DESC) rnk
		FROM CTE 
	)k
)
SELECT PrimaryKey , MAX([Score1])[Score1],MAX([Score2])[Score2],MAX(DateField1)DateField1,MAX(DateField2)DateField2 FROM CTE1
PIVOT ( MAX(Score) FOR c1 IN ([Score1],[Score2]) )y1
PIVOT ( MAX(DateField) FOR c2 IN (DateField1,DateField2) )y1
GROUP BY PrimaryKey 

Open in new window


O/p

PrimaryKey  Score1      Score2      DateField1 DateField2
----------- ----------- ----------- ---------- ----------
1           80          75          2016-01-15 2016-01-12
2           35          12          2016-01-16 2016-01-01

(2 row(s) affected)

Open in new window


hope it helps !!
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 9

Author Comment

by:Evan Cutler
ID: 41903610
No.  The tables are not partitioned.  Being that the key is an integer, I'm not educated enough how to pull it off.
However, here's the SQLPLAN:  -plan.sqlplan

Thanks much.
0
 
LVL 41

Expert Comment

by:pcelba
ID: 41903971
Just a note:  The column named PrimaryKey does not represent Primary Key value defined as Unique row identification value.

You should also define how to process equal scores, e.g.
PrimaryKey, Score, DateField
1, 50, 1/1/2016
2, 12, 1/1/2016
1, 75, 1/12/2016
1, 80, 1/15/2016
1, 80, 1/14/2016
2, 35, 1/16/2016

Open in new window

What is the highest value and what is the second highest value in such case?
0
 
LVL 48

Expert Comment

by:PortletPaul
ID: 41904021
37% is spent scaning the table for data

plan-diagram.png
and another 54% ordering (for the row_number) and both of these are heavy in I/O

sorting.png
the index referenced is a non-clustered index [Property_By_Date_IDX] which appears to include [RTPropertyUniqueIdentifier] [LoadDate] and [Scores]

Unless you can find a way to stop scanning the whole table to arrive at your result I don't see anything simple to improve that query performance.

response of this query may change if the statistics for the table are not current
0
 
LVL 17

Expert Comment

by:Pawan Kumar Khowal
ID: 41904072
Hi Evan,
Please try my last comment.

Thanks
Pawan
0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41904198
You are right Pawan,

This is a log table of sorts.  Primary Keys do repeat, but they are primary keys.  When this statement is correct, I expect the output to be unique.  Testing your SQL now.  Thanks much.
0
 
LVL 41

Expert Comment

by:pcelba
ID: 41904238
No, Primary keys cannot repeat.  If the value of the key is repeating in some table then we are talking about Foreign keys which point to a different table where these values do not repeat and just in such table we can talk about Prinary key (which is unique by definition).

So your table does not have primary key which makes the records identification harder.
0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41904463
exactly, so Table B has the primary key, and is updated by Table A.
The query in question performs the action via Update on Select.
Unless you have a better idea.
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 41904616
You really need an index on PROPERTY_SCORE keyed by ( RTPropertyUniqueIdentifier, LoadDate ) for this query.  That will prevent a sort and make the aggregation vastly faster.

Given how large the table is, you should review the existing index stats to see if/how those indexes are being used.  You might be able to adjust this new index to replace an older index.  But, if not, you still need the new index.
0
 
LVL 9

Author Comment

by:Evan Cutler
ID: 41904657
another question.  should I create a materialized view of this query?
0
 
LVL 17

Expert Comment

by:Pawan Kumar Khowal
ID: 41905028
should I create a materialized view of this query?

Materialized view is like a prebaked data. Create a query and put index on it. They come with lot of limitations. So you have to decide. Also for this requirement I don't think you should create a View. You should go with the stored procedure.

Any luck with the query I gave?

Thank You,
Pawan
0
 
LVL 9

Author Closing Comment

by:Evan Cutler
ID: 41905045
Yes, I believe this query works.  It returned all 125M records in 16 minutes, which is a good time to run with.  Thanks much.
0
 
LVL 17

Expert Comment

by:Pawan Kumar Khowal
ID: 41905048
Welcome Evan !!
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
This videos aims to give the viewer a basic demonstration of how a user can query current session information by using the SYS_CONTEXT function
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now