Funky_DBA
asked on
What does this SQL do ?
1) I'd like to know what this SQL does :-
Select member_id, activity_year_month, activity_date, SUM(revenue) OVER(PARTITION BY member_id, activity_year_month ORDER BY activity_date ROWS BETWEEN unbounded preceding AND current row ) from dm.revenue_analysis.
2) How do I rewrite the SQL using a group by clause to return identical information.
Select member_id, activity_year_month, activity_date, SUM(revenue) OVER(PARTITION BY member_id, activity_year_month ORDER BY activity_date ROWS BETWEEN unbounded preceding AND current row ) from dm.revenue_analysis.
2) How do I rewrite the SQL using a group by clause to return identical information.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Yes I aggree with that.
I also looked up what the ROWS BETWEEN unbounded preceding AND current row does and found this:
>>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW This window starts at the beginning of the partition, and ends with the current row. Use this construct when computing cumulative results, such as cumulative sums.
It came from this page if you want more info: http://www.ianywhere.com/developer/product_manuals/sqlanywhere/1000/en/html/dbugen10/ug-ug-olap-sizing-a-window.html
I also looked up what the ROWS BETWEEN unbounded preceding AND current row does and found this:
>>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW This window starts at the beginning of the partition, and ends with the current row. Use this construct when computing cumulative results, such as cumulative sums.
It came from this page if you want more info: http://www.ianywhere.com/developer/product_manuals/sqlanywhere/1000/en/html/dbugen10/ug-ug-olap-sizing-a-window.html
ASKER
Hi CGLuttrell
Thanks again for your comments which are very useful.
I guess this is ANSI Standard SQL and works for both Oracle and SQL Server ?
Thanks again for your comments which are very useful.
I guess this is ANSI Standard SQL and works for both Oracle and SQL Server ?
ASKER
I also came across the word 'window' and couldn't decide whether it was used to mean "a logical group of data" - which semed unlikely or was related to Microsoft Windows rather than analytic functions.
Can you clarify it for me ?
Thanks a lot.
The Sum() Over() works for sure on MS SqlServer 2005+. Not sure about the Rows Between part I want to test that myself when I can. The Window reference is to rows of data and not MS Windows
ASKER
Thanks a lot. That's a great help.
It's 2.00 am here in London England.
Bye for now.
It's 2.00 am here in London England.
Bye for now.
You are calculating each rows ranking within a set based on a partition and ordering.
A partition is like a GROUP BY. So the data you are looking at is the member_id, activity_year_month, activity_date, and the SUM from the revenue increasing down the search, added according to the activity_date you are trying to rank off as that is what you are ordering by.
The unbounded preceding and current row means the start of the window is fixed at the first row in the result returned by the query. Current row is the current row in the result set being processed.
I hope that makes sense.
A partition is like a GROUP BY. So the data you are looking at is the member_id, activity_year_month, activity_date, and the SUM from the revenue increasing down the search, added according to the activity_date you are trying to rank off as that is what you are ordering by.
The unbounded preceding and current row means the start of the window is fixed at the first row in the result returned by the query. Current row is the current row in the result set being processed.
I hope that makes sense.
Ignore my post I had this open earlier and never submitted until I returned, several hours later.
I investigated the Rows Between part and it does not appear to be a part of SQL 2005. It appears to be an analytical piece that Oracle has but not MS SQL, unless there is some equivalent in MS Analysis Services and an MDX query which I have not had a chance to get into yet. Hope this info helped you.
ASKER
I have come up with two alternative solution to the question of rewriting the SQL using GROUP BY and I would appreciate your thoughts.
SOLUTION 1)
SELECT member_id, activity_month, activity_date,
(SELECT SUM(revenue) FROM dm.revenue_analysis
WHERE member_id = a.member_id
AND activiy_month_year = a.activity_month_year)
FROM dm.revenue_analysis a
SOLUTION 2)
My second thought would be to you use a cumulative sum to get the same results, like this :-
SELECT a.Member_id,
a.activity_year_month,
a.activity_date,
SUM(b.revenue)
FROM dm.revenue_analysis a
CROSS JOIN dm.revenue_analysis b
WHERE (b.activity_date <= a.activity_date) AS RunningTotal
GROUP BY a.activity_date,a.revenue
ORDER BY a.activity_date,a.revenue
SOLUTION 1)
SELECT member_id, activity_month, activity_date,
(SELECT SUM(revenue) FROM dm.revenue_analysis
WHERE member_id = a.member_id
AND activiy_month_year = a.activity_month_year)
FROM dm.revenue_analysis a
SOLUTION 2)
My second thought would be to you use a cumulative sum to get the same results, like this :-
SELECT a.Member_id,
a.activity_year_month,
a.activity_date,
SUM(b.revenue)
FROM dm.revenue_analysis a
CROSS JOIN dm.revenue_analysis b
WHERE (b.activity_date <= a.activity_date) AS RunningTotal
GROUP BY a.activity_date,a.revenue
ORDER BY a.activity_date,a.revenue
You had a view issues on the second query which I adjusted below.
As for these two giving the same results, I do not think so. The first query will always give you the same value back in the subquery for each set of records based on member_id and activity_month_year not a running total as the second one should produce. the first one will probably run quicker because of that also.
As for these two giving the same results, I do not think so. The first query will always give you the same value back in the subquery for each set of records based on member_id and activity_month_year not a running total as the second one should produce. the first one will probably run quicker because of that also.
SELECT a.Member_id,
a.activity_year_month,
a.activity_date,
SUM(b.revenue) AS RunningTotal
FROM dm.revenue_analysis a
CROSS JOIN dm.revenue_analysis b
WHERE (b.activity_date <= a.activity_date)
GROUP BY a.Member_id, a.activity_year_month, a.activity_date,a.revenue
ORDER BY a.activity_date,a.revenue
For the first option to give a running total it would need to be more like below ... but if you had two rows with the same activity_date within member_id and activity_year_month ... then the result would not be exactly the same as the with the analytic function.
SELECT member_id, activity_year_month, activity_date,
(SELECT SUM(revenue)
FROM dm.revenue_analysis b
WHERE b.member_id = a.member_id
AND b.activity_year_month = a.activity_year_month
AND b.activity_date <= a.activity_date)
FROM dm.revenue_analysis a
and as an INNER JOIN
SELECT a.Member_id,
a.activity_year_month,
a.activity_date,
SUM(b.revenue) AS RunningTotal
FROM dm.revenue_analysis a
INNER JOIN dm.revenue_analysis b
WHERE b.member_id = a.member_id
AND b.activity_year_month = a.activity_year_month
AND b.activity_date <= a.activity_date
GROUP BY a.Member_id, a.activity_year_month, a.activity_date
ORDER BY a.Member_id, a.activity_year_month, a.activity_date
to remove the issue with duplicate activity dates, this should work.
WITH ranked_data as (
SELECT Member_id, activity_year_month, activity_date, revenue,
row_number() OVER(PARTITION BY Member_id, activity_year_month
ORDER BY activity_date) AS rown
FROM dm.revenue_analysis
)
SELECT a.Member_id,
a.activity_year_month,
a.activity_date,
SUM(b.revenue) AS RunningTotal
FROM ranked_data a
INNER JOIN ranked_data b
WHERE b.member_id = a.member_id
AND b.activity_year_month = a.activity_year_month
AND b.rown <= a.rown
GROUP BY a.Member_id, a.activity_year_month, a.activity_date
ORDER BY a.Member_id, a.activity_year_month, a.rown
just to be clear is your query to be run against Oracle or MS SQL and which version? Can make a huge difference on syntax used.
Sorry - I made a silly mistake with both of my SQL's ... I had a WHERE when it should have been an ON. I also needed to include the row_number() column in the group by of the second SQL.
These should work in either Oracle 9i (and above) and SQL Server 2005 (and above).
These should work in either Oracle 9i (and above) and SQL Server 2005 (and above).
-- Suffers from duplicate activity_date issue
SELECT a.Member_id,
a.activity_year_month,
a.activity_date,
SUM(b.revenue) AS RunningTotal
FROM dm.revenue_analysis a
INNER JOIN dm.revenue_analysis b
ON b.member_id = a.member_id
AND b.activity_year_month = a.activity_year_month
AND b.activity_date <= a.activity_date
GROUP BY a.Member_id, a.activity_year_month, a.activity_date
ORDER BY a.Member_id, a.activity_year_month, a.activity_date
;
-- Should fix duplicate activity_date issue
WITH ranked_data as (
SELECT Member_id, activity_year_month, activity_date, revenue,
row_number() OVER(PARTITION BY Member_id, activity_year_month
ORDER BY activity_date) AS rown
FROM dm.revenue_analysis
)
SELECT a.Member_id,
a.activity_year_month,
a.activity_date,
SUM(b.revenue) AS RunningTotal
FROM ranked_data a
INNER JOIN ranked_data b
ON b.member_id = a.member_id
AND b.activity_year_month = a.activity_year_month
AND b.rown <= a.rown
GROUP BY a.Member_id, a.activity_year_month, a.rown, a.activity_date
ORDER BY a.Member_id, a.activity_year_month, a.rown, a.activity_date
;
ASKER
Hi Folks,
Thanks a lot for your help which was very useful.
I have split the points in a way that reflects the promptness and completeness of the first answer from lwadwell and its corresponding value to me which was related to a possible job interview.
Thanks a lot for your help which was very useful.
I have split the points in a way that reflects the promptness and completeness of the first answer from lwadwell and its corresponding value to me which was related to a possible job interview.
ASKER
Thanks for your useful comments.
Do you agree that it is correct to say "Redoing it using a GROUP BY ... very difficult as it will not do the accumulated totals." ?