D B
asked on
Query That Resets ROW_NUMBER Value
I am using SQL Server 2012. I basically have a set of data that has an account number, date, account type and balance.
I am trying to write a query that will identify when the balance has not changed within a specified number of days (threshold) within a given 'review period'. For example, looking back 30 days, is there a period of where the balance hasn't changed within a 5 day threshold.
One thing I've noticed is if I use ROW_NUMBER() with the following syntax:
45,000, 45,000, 45,000, 44,000, 45000, 45000,... (see account_id 3700)
ROW_NUMBER function returns values of 1, 2, 3, 1, 4, 5...
I want it to return 1, 2, 3, 1, 1, 2...
Looking at each individual account by type, I want to look at the dates sequentially, and EVERY time the balance changes, reset the ranking to 1.
Here is some sample data:
I am trying to write a query that will identify when the balance has not changed within a specified number of days (threshold) within a given 'review period'. For example, looking back 30 days, is there a period of where the balance hasn't changed within a 5 day threshold.
One thing I've noticed is if I use ROW_NUMBER() with the following syntax:
SELECT account_id,
account_type,
balance_date,
account_balance,
ROW_NUMBER() OVER ( PARTITION BY account_id, account_type, account_balance ORDER BY account_id, account_type, balance_date DESC ) AS ranking
FROM account
ODER BY account_id, security_id, last_pricing_date, mark_to_market_price;
If the balance for this account is (commas separate by date in ascending order): 45,000, 45,000, 45,000, 44,000, 45000, 45000,... (see account_id 3700)
ROW_NUMBER function returns values of 1, 2, 3, 1, 4, 5...
I want it to return 1, 2, 3, 1, 1, 2...
Looking at each individual account by type, I want to look at the dates sequentially, and EVERY time the balance changes, reset the ranking to 1.
Here is some sample data:
CREATE TABLE acct_temp
(account_id BIGINT,
account_type CHAR(3),
balance_date DATE,
account_balance MONEY)
INSERT INTO acct_temp
VALUES
(3700,'SAV','2015-09-24',45000),
(3700,'SAV','2015-09-25',45000),
(3700,'SAV','2015-09-28',45000),
(3700,'SAV','2015-09-29',45000),
(3700,'SAV','2015-09-30',45000),
(3700,'SAV','2015-10-01',45000),
(3700,'SAV','2015-10-02',45000),
(3700,'SAV','2015-10-05',45000),
(3700,'SAV','2015-10-06',45000),
(3700,'SAV','2015-10-07',43000),
(3700,'SAV','2015-10-08',45000),
(3700,'SAV','2015-10-09',45000),
(3700,'SAV','2015-10-12',45000),
(740,'SAV','2015-10-02',45000),
(740,'SAV','2015-10-05',41000),
(740,'SAV','2015-10-06',45000),
(740,'SAV','2015-10-07',45000),
(740,'SAV','2015-10-08',45000),
(740,'SAV','2015-10-09',45000),
(740,'SAV','2015-10-12',45000),
(858,'SAV','2015-09-24',45000),
(858,'SAV','2015-09-25',45000),
(858,'SAV','2015-09-28',45100),
(858,'SAV','2015-09-29',45000),
(858,'SAV','2015-09-30',45300),
(858,'SAV','2015-10-01',45000),
(858,'SAV','2015-10-02',45000),
(858,'SAV','2015-10-05',45700),
(858,'SAV','2015-10-06',45000),
(858,'SAV','2015-10-07',45000),
(858,'SAV','2015-10-08',46100),
(858,'SAV','2015-10-09',45000),
(858,'SAV','2015-10-12',45000)
what does this actually mean?
..., looking back 30 days, is there a period of where the balance hasn't changed within a 5 day threshold.
interpretation:
a.) go back 25 to 35 days
b.) are there variations in the balance during that period
I would need to see the "expected result" from your sample data
..., looking back 30 days, is there a period of where the balance hasn't changed within a 5 day threshold.
interpretation:
a.) go back 25 to 35 days
b.) are there variations in the balance during that period
I would need to see the "expected result" from your sample data
ASKER
For clarification, ranking is done on the entire set throughout the review period (e.g. 30 days) with no concern of the threshold period (5 days). The threshold value comes into play in a subsequent query.
Jim,
I already had tried dense_rank. It returns the same result as row_number. It is not that I am trying to close gaps, I want to start numbering again at 1.
Paul,
Today is 10/20/2015. The review period is 30 days, which means I am looking back to 9/20/2015. The threshold is 5 days. Think of it as a sliding window, so from 9/30 to today, where there and time periods where, for 5 consecutive days where the balance remained the same.
You stated you would need to see the expected results. I think I stated that in my original post
"If the balance for this account is (commas separate by date in ascending order):
45,000, 45,000, 45,000, 44,000, 45000, 45000,... (see account_id 3700)
ROW_NUMBER function returns values of 1, 2, 3, 1, 4, 5...
I want it to return 1, 2, 3, 1, 1, 2..."
If the balance changes, ANY subsequent balance that is not the same value returns a value of 1, even if it is the same as a previously posted balance:
Balance Rank
45,000 1
45,000 2
45,000 3
45,000 4
45,000 5
46,000 1
46,000 2
45,000 1
45,000 2
44,000 1
45,000 1
Jim,
I already had tried dense_rank. It returns the same result as row_number. It is not that I am trying to close gaps, I want to start numbering again at 1.
Paul,
Today is 10/20/2015. The review period is 30 days, which means I am looking back to 9/20/2015. The threshold is 5 days. Think of it as a sliding window, so from 9/30 to today, where there and time periods where, for 5 consecutive days where the balance remained the same.
You stated you would need to see the expected results. I think I stated that in my original post
"If the balance for this account is (commas separate by date in ascending order):
45,000, 45,000, 45,000, 44,000, 45000, 45000,... (see account_id 3700)
ROW_NUMBER function returns values of 1, 2, 3, 1, 4, 5...
I want it to return 1, 2, 3, 1, 1, 2..."
If the balance changes, ANY subsequent balance that is not the same value returns a value of 1, even if it is the same as a previously posted balance:
Balance Rank
45,000 1
45,000 2
45,000 3
45,000 4
45,000 5
46,000 1
46,000 2
45,000 1
45,000 2
44,000 1
45,000 1
I have not been able to run this at all. However I'm hoping it will assist.
declare @today date = '20151020'
declare @period int = 30
declare @start date = dateadd(day,-@period,@today)
--select @start, @period, @today
select
a.*
, ca.ranking
from acct_temp as a
cross apply (
select
1 + count(distinct account_balance )
from acct_temp
where account_id = a.account_id
and balance_date >= dateadd(day,-5,a.balance_date)
and balance_date < a.balance_date
and account_balance <> a.account_balance
) ca (ranking)
where balance_date >= @start
and balance_date < dateadd(day,1,@today)
order by
account_id
, balance_date
ASKER
Paul,
Results are not what I would expect. Here is one account. I would expect to see values of:
1,1,1,2,3,4 instead of 1,2,2,2,2,2 as shown from your query.
Results are not what I would expect. Here is one account. I would expect to see values of:
1,1,1,2,3,4 instead of 1,2,2,2,2,2 as shown from your query.
account_id account_type balance_date account_balance ranking
740 SAV 2015-10-02 45000.00 1
740 SAV 2015-10-05 41000.00 2
740 SAV 2015-10-06 45000.00 2
740 SAV 2015-10-07 45000.00 2
740 SAV 2015-10-08 45000.00 2
740 SAV 2015-10-09 45000.00 2
This is probably not the most efficient way to do this but it seems to satisfy your requirements.
IF OBJECT_ID('tempdb..#acct_temp') IS NOT NULL
DROP TABLE #acct_temp;
GO
CREATE TABLE #acct_temp
(
account_id BIGINT,
account_type CHAR(3),
balance_date DATE,
account_balance MONEY
);
INSERT INTO #acct_temp
VALUES
(3700,'SAV','2015-09-24',45000),
(3700,'SAV','2015-09-25',45000),
(3700,'SAV','2015-09-28',45000),
(3700,'SAV','2015-09-29',45000),
(3700,'SAV','2015-09-30',45000),
(3700,'SAV','2015-10-01',45000),
(3700,'SAV','2015-10-02',45000),
(3700,'SAV','2015-10-05',45000),
(3700,'SAV','2015-10-06',45000),
(3700,'SAV','2015-10-07',43000),
(3700,'SAV','2015-10-08',45000),
(3700,'SAV','2015-10-09',45000),
(3700,'SAV','2015-10-12',45000),
(740,'SAV','2015-10-02',45000),
(740,'SAV','2015-10-05',41000),
(740,'SAV','2015-10-06',45000),
(740,'SAV','2015-10-07',45000),
(740,'SAV','2015-10-08',45000),
(740,'SAV','2015-10-09',45000),
(740,'SAV','2015-10-12',45000),
(858,'SAV','2015-09-24',45000),
(858,'SAV','2015-09-25',45000),
(858,'SAV','2015-09-28',45100),
(858,'SAV','2015-09-29',45000),
(858,'SAV','2015-09-30',45300),
(858,'SAV','2015-10-01',45000),
(858,'SAV','2015-10-02',45000),
(858,'SAV','2015-10-05',45700),
(858,'SAV','2015-10-06',45000),
(858,'SAV','2015-10-07',45000),
(858,'SAV','2015-10-08',46100),
(858,'SAV','2015-10-09',45000),
(858,'SAV','2015-10-12',45000);
WITH cteAccount AS
(
SELECT account_id, account_type, balance_date, account_balance,
ROW_NUMBER() OVER(PARTITION BY account_id, account_type ORDER BY balance_date) AS RowNumber,
LAG(account_balance, 1, NULL) OVER(PARTITION BY account_id, account_type ORDER BY balance_date) AS previous_balance,
LEAD(account_balance, 1, NULL) OVER(PARTITION BY account_id, account_type ORDER BY balance_date) AS next_balance
FROM #acct_temp
),
cteBalanceGroup AS
(
SELECT account_id, account_type, balance_date, account_balance,
RowNumber,
CASE
WHEN previous_balance IS NULL THEN RowNumber
WHEN account_balance <> previous_balance THEN RowNumber
END AS balance_ranking
FROM cteAccount
),
cteBalanceRanking AS
(
SELECT R1.account_id, R1.account_type, R1.balance_date, R1.account_balance,
MAX(ISNULL(R1.Balance_ranking, R2.balance_ranking)) AS ranking
FROM cteBalanceGroup AS R1
LEFT OUTER JOIN cteBalanceGroup AS R2
ON R1.account_id = R2.account_id
AND R1.account_type = R2.account_type
AND R1.RowNumber > R2.RowNumber
AND R2.balance_ranking IS NOT NULL
GROUP BY R1.account_id, R1.account_type, R1.balance_date, R1.account_balance
)
SELECT account_id, account_type, balance_date, account_balance,
DENSE_RANK() OVER(PARTITION BY account_id, account_type ORDER BY ranking) AS ranking
FROM cteBalanceRanking
ASKER
Sorry about the delay. I am at PASS Summit this week and will get to this next week.
I'm jealous...trying to arrange it for next year.
dbbishop, do you still need help with this question?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
dbbishop, I think you can close it now, right?
ASKER
Most appropriate answer.
This fits the description of DENSE_RANK() instead of ROW_NUMBER, where there are no gaps in rankings, meaning if there's a two-way tie for first place, both are 1, and then 3rd place is 2.