Link to home
Start Free TrialLog in
Avatar of Funky_DBA
Funky_DBA

asked on

What does this SQL do ?

1) I'd like to know what this SQL does :-
Select member_id, activity_year_month, activity_date, SUM(revenue)  OVER(PARTITION BY member_id, activity_year_month ORDER BY activity_date ROWS BETWEEN unbounded preceding AND current row ) from dm.revenue_analysis.
2) How do I rewrite the SQL using a group by clause to return identical information.
ASKER CERTIFIED SOLUTION
Avatar of Lee Wadwell
Lee Wadwell
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Funky_DBA
Funky_DBA

ASKER

Hi CGLuttrell

Thanks for your useful comments.
Do you agree that it is correct to say "Redoing it using a GROUP BY ... very difficult as it will not do the accumulated totals." ?
Yes I aggree with that.  
I also looked up what the  ROWS BETWEEN unbounded preceding AND current row does and found this:
>>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  This window starts at the beginning of the partition, and ends with the current row. Use this construct when computing cumulative results, such as cumulative sums.
It came from this page if you want more info: http://www.ianywhere.com/developer/product_manuals/sqlanywhere/1000/en/html/dbugen10/ug-ug-olap-sizing-a-window.html
Hi CGLuttrell
Thanks again for your comments which are very useful.
I guess this is ANSI Standard SQL and works for both Oracle and SQL Server ?



I also came across the word 'window' and couldn't decide whether it was used to mean "a logical group of data" - which semed unlikely or was related to Microsoft Windows rather than analytic functions.
Can you clarify it for me ?

Thanks a lot.
 
The Sum() Over() works for sure on MS SqlServer 2005+.  Not sure about the Rows Between part I want to test that myself when I can. The Window reference is to rows of data and not MS Windows  
Thanks a lot.  That's a great help.
It's 2.00 am here in London England.
Bye for now.
You are calculating each rows ranking within a set based on a partition and ordering.
A partition is like a GROUP BY. So the data you are looking at is the member_id, activity_year_month,  activity_date, and the SUM from  the revenue increasing down the search,  added according to the activity_date you are trying to rank off as that is what you are ordering by.
The unbounded preceding and current row means the start of the window is fixed at the first row in the result returned by the query. Current row is the current row in the result set being processed.
I hope that makes sense.
Ignore my post I had this open earlier and never submitted until I returned, several hours later.
I investigated the Rows Between part and it does not appear to be a part of SQL 2005.  It appears to be an analytical piece that Oracle has but not MS SQL, unless there is some equivalent in MS Analysis Services and an MDX query which I have not had a chance to get into yet.  Hope this info helped you.
I have come up with two alternative solution to the question of rewriting the SQL using GROUP BY and I would appreciate your thoughts.

SOLUTION 1)  
SELECT member_id, activity_month, activity_date,
(SELECT SUM(revenue) FROM dm.revenue_analysis
 WHERE member_id = a.member_id
 AND      activiy_month_year = a.activity_month_year)
FROM    dm.revenue_analysis a

SOLUTION 2)  
My second thought would be to you use a cumulative sum to get the same results, like this :-

SELECT a.Member_id,
                a.activity_year_month,
                a.activity_date,
                SUM(b.revenue)
FROM    dm.revenue_analysis a
CROSS JOIN dm.revenue_analysis b
WHERE (b.activity_date <= a.activity_date) AS RunningTotal
GROUP BY a.activity_date,a.revenue
ORDER BY a.activity_date,a.revenue

You had a view issues on the second query which I adjusted below.  
As for these two giving the same results, I do not think so.  The first query will always give you the same value back in the subquery for each set of records based on member_id and activity_month_year not a running total as the second one should produce.  the first one will probably run quicker because of that also.
SELECT a.Member_id,
                a.activity_year_month,
                a.activity_date,
                SUM(b.revenue) AS RunningTotal
FROM    dm.revenue_analysis a
CROSS JOIN dm.revenue_analysis b
WHERE (b.activity_date <= a.activity_date)
GROUP BY a.Member_id, a.activity_year_month, a.activity_date,a.revenue
ORDER BY a.activity_date,a.revenue

Open in new window

For the first option to give a running total it would need to be more like below ... but if you had two rows with the same activity_date within member_id and activity_year_month ... then the result would not be exactly the same as the with the analytic function.
SELECT member_id, activity_year_month, activity_date,
       (SELECT SUM(revenue) 
          FROM dm.revenue_analysis b
         WHERE b.member_id = a.member_id
           AND b.activity_year_month = a.activity_year_month
           AND b.activity_date <= a.activity_date)
FROM    dm.revenue_analysis a

Open in new window

and as an INNER JOIN
SELECT a.Member_id,
       a.activity_year_month,
       a.activity_date,
       SUM(b.revenue) AS RunningTotal
  FROM dm.revenue_analysis a
 INNER JOIN dm.revenue_analysis b
 WHERE b.member_id = a.member_id
   AND b.activity_year_month = a.activity_year_month
   AND b.activity_date <= a.activity_date
 GROUP BY a.Member_id, a.activity_year_month, a.activity_date
 ORDER BY a.Member_id, a.activity_year_month, a.activity_date

Open in new window

to remove the issue with duplicate activity dates, this should work.
WITH ranked_data as (
SELECT Member_id, activity_year_month, activity_date, revenue,
       row_number() OVER(PARTITION BY Member_id, activity_year_month 
                             ORDER BY activity_date) AS rown
  FROM dm.revenue_analysis
)
SELECT a.Member_id,
       a.activity_year_month,
       a.activity_date,
       SUM(b.revenue) AS RunningTotal
  FROM ranked_data a
 INNER JOIN ranked_data b
 WHERE b.member_id = a.member_id
   AND b.activity_year_month = a.activity_year_month
   AND b.rown <= a.rown
 GROUP BY a.Member_id, a.activity_year_month, a.activity_date
 ORDER BY a.Member_id, a.activity_year_month, a.rown

Open in new window

just to be clear is your query to be run against Oracle or MS SQL and which version?  Can make a huge difference on syntax used.
Sorry - I made a silly mistake with both of my SQL's ... I had a WHERE when it should have been an ON.  I also needed to include the row_number() column in the group by of the second SQL.

These should work in either Oracle 9i (and above) and SQL Server 2005 (and above).
-- Suffers from duplicate activity_date issue
SELECT a.Member_id,
       a.activity_year_month,
       a.activity_date,
       SUM(b.revenue) AS RunningTotal
  FROM dm.revenue_analysis a
 INNER JOIN dm.revenue_analysis b
    ON b.member_id = a.member_id
   AND b.activity_year_month = a.activity_year_month
   AND b.activity_date <= a.activity_date
 GROUP BY a.Member_id, a.activity_year_month, a.activity_date
 ORDER BY a.Member_id, a.activity_year_month, a.activity_date
;
 
-- Should fix duplicate activity_date issue
WITH ranked_data as (
SELECT Member_id, activity_year_month, activity_date, revenue,
       row_number() OVER(PARTITION BY Member_id, activity_year_month 
                             ORDER BY activity_date) AS rown
  FROM dm.revenue_analysis
)
SELECT a.Member_id,
       a.activity_year_month,
       a.activity_date,
       SUM(b.revenue) AS RunningTotal
  FROM ranked_data a
 INNER JOIN ranked_data b
    ON b.member_id = a.member_id
   AND b.activity_year_month = a.activity_year_month
   AND b.rown <= a.rown
 GROUP BY a.Member_id, a.activity_year_month, a.rown, a.activity_date
 ORDER BY a.Member_id, a.activity_year_month, a.rown, a.activity_date
;

Open in new window

Hi Folks,
Thanks a lot for your help which was very useful.
I have split the points in a way that reflects the promptness and completeness of the first answer from  lwadwell and its corresponding value to me which was related to a possible job interview.