• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1121
  • Last Modified:

What does this SQL do ?

1) I'd like to know what this SQL does :-
Select member_id, activity_year_month, activity_date, SUM(revenue)  OVER(PARTITION BY member_id, activity_year_month ORDER BY activity_date ROWS BETWEEN unbounded preceding AND current row ) from dm.revenue_analysis.
2) How do I rewrite the SQL using a group by clause to return identical information.
0
Funky_DBA
Asked:
Funky_DBA
  • 6
  • 6
  • 5
  • +1
2 Solutions
 
lwadwellCommented:
Hi Funky_DBA,

I am going to make an assumption that you know what the SQL
   Select member_id, activity_year_month, activity_date from dm.revenue_analysis
will do.  It is a straight forward select of data.

As to the analytic function:
  , SUM(revenue)  OVER(PARTITION BY member_id, activity_year_month ORDER BY activity_date ROWS BETWEEN unbounded preceding AND current row )
will do, well it will produce an accumulating/running total.

What it is doing is:
1. summing the the revenue column "SUM(revenue)"
2. based on groupings "OVER()"  ... very similar to a GROUP BY
2.1. the group is on the columns member_id, activity_year_month "PARTITION BY member_id, activity_year_month"
2.2. order the rows in activity_date order when doing the grouping and sum "ORDER BY activity_date"
2.3. and sum all rows in the grouping up to the currect row "ROWS BETWEEN unbounded preceding AND current row " ... hence the accumulated totals or sum

Redoing it using a GROUP BY ... very difficult as it will not do the accumulated totals.  The nearest equivalent would be
  Select member_id, activity_year_month, SUM(revenue)  from dm.revenue_analysis GROUP BY member_id, activity_year_month
but this looses the activity_date column.


lwadwell
0
 
Chris LuttrellSenior Database ArchitectCommented:
Good explanation by lwadwell.
The SUM() OVER (PARTITION BY...) gives you the same value you would get in a separate subselet grouping by the Partition columns, but returns that value on each line of the results while not having to loose details like lwadwell pointed out.  It is very useful for queries where you want to see something like what percentage of the total monthly sales this particular line item represented.
0
 
Funky_DBAAuthor Commented:
Hi CGLuttrell

Thanks for your useful comments.
Do you agree that it is correct to say "Redoing it using a GROUP BY ... very difficult as it will not do the accumulated totals." ?
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
Chris LuttrellSenior Database ArchitectCommented:
Yes I aggree with that.  
I also looked up what the  ROWS BETWEEN unbounded preceding AND current row does and found this:
>>ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW  This window starts at the beginning of the partition, and ends with the current row. Use this construct when computing cumulative results, such as cumulative sums.
It came from this page if you want more info: http://www.ianywhere.com/developer/product_manuals/sqlanywhere/1000/en/html/dbugen10/ug-ug-olap-sizing-a-window.html
0
 
Funky_DBAAuthor Commented:
Hi CGLuttrell
Thanks again for your comments which are very useful.
I guess this is ANSI Standard SQL and works for both Oracle and SQL Server ?


0
 
Funky_DBAAuthor Commented:

I also came across the word 'window' and couldn't decide whether it was used to mean "a logical group of data" - which semed unlikely or was related to Microsoft Windows rather than analytic functions.
Can you clarify it for me ?

Thanks a lot.
 
0
 
Chris LuttrellSenior Database ArchitectCommented:
The Sum() Over() works for sure on MS SqlServer 2005+.  Not sure about the Rows Between part I want to test that myself when I can. The Window reference is to rows of data and not MS Windows  
0
 
Funky_DBAAuthor Commented:
Thanks a lot.  That's a great help.
It's 2.00 am here in London England.
Bye for now.
0
 
exploitedjCommented:
You are calculating each rows ranking within a set based on a partition and ordering.
A partition is like a GROUP BY. So the data you are looking at is the member_id, activity_year_month,  activity_date, and the SUM from  the revenue increasing down the search,  added according to the activity_date you are trying to rank off as that is what you are ordering by.
The unbounded preceding and current row means the start of the window is fixed at the first row in the result returned by the query. Current row is the current row in the result set being processed.
I hope that makes sense.
0
 
exploitedjCommented:
Ignore my post I had this open earlier and never submitted until I returned, several hours later.
0
 
Chris LuttrellSenior Database ArchitectCommented:
I investigated the Rows Between part and it does not appear to be a part of SQL 2005.  It appears to be an analytical piece that Oracle has but not MS SQL, unless there is some equivalent in MS Analysis Services and an MDX query which I have not had a chance to get into yet.  Hope this info helped you.
0
 
Funky_DBAAuthor Commented:
I have come up with two alternative solution to the question of rewriting the SQL using GROUP BY and I would appreciate your thoughts.

SOLUTION 1)  
SELECT member_id, activity_month, activity_date,
(SELECT SUM(revenue) FROM dm.revenue_analysis
 WHERE member_id = a.member_id
 AND      activiy_month_year = a.activity_month_year)
FROM    dm.revenue_analysis a

SOLUTION 2)  
My second thought would be to you use a cumulative sum to get the same results, like this :-

SELECT a.Member_id,
                a.activity_year_month,
                a.activity_date,
                SUM(b.revenue)
FROM    dm.revenue_analysis a
CROSS JOIN dm.revenue_analysis b
WHERE (b.activity_date <= a.activity_date) AS RunningTotal
GROUP BY a.activity_date,a.revenue
ORDER BY a.activity_date,a.revenue

0
 
Chris LuttrellSenior Database ArchitectCommented:
You had a view issues on the second query which I adjusted below.  
As for these two giving the same results, I do not think so.  The first query will always give you the same value back in the subquery for each set of records based on member_id and activity_month_year not a running total as the second one should produce.  the first one will probably run quicker because of that also.
SELECT a.Member_id,
                a.activity_year_month,
                a.activity_date,
                SUM(b.revenue) AS RunningTotal
FROM    dm.revenue_analysis a
CROSS JOIN dm.revenue_analysis b
WHERE (b.activity_date <= a.activity_date)
GROUP BY a.Member_id, a.activity_year_month, a.activity_date,a.revenue
ORDER BY a.activity_date,a.revenue

Open in new window

0
 
lwadwellCommented:
For the first option to give a running total it would need to be more like below ... but if you had two rows with the same activity_date within member_id and activity_year_month ... then the result would not be exactly the same as the with the analytic function.
SELECT member_id, activity_year_month, activity_date,
       (SELECT SUM(revenue) 
          FROM dm.revenue_analysis b
         WHERE b.member_id = a.member_id
           AND b.activity_year_month = a.activity_year_month
           AND b.activity_date <= a.activity_date)
FROM    dm.revenue_analysis a

Open in new window

0
 
lwadwellCommented:
and as an INNER JOIN
SELECT a.Member_id,
       a.activity_year_month,
       a.activity_date,
       SUM(b.revenue) AS RunningTotal
  FROM dm.revenue_analysis a
 INNER JOIN dm.revenue_analysis b
 WHERE b.member_id = a.member_id
   AND b.activity_year_month = a.activity_year_month
   AND b.activity_date <= a.activity_date
 GROUP BY a.Member_id, a.activity_year_month, a.activity_date
 ORDER BY a.Member_id, a.activity_year_month, a.activity_date

Open in new window

0
 
lwadwellCommented:
to remove the issue with duplicate activity dates, this should work.
WITH ranked_data as (
SELECT Member_id, activity_year_month, activity_date, revenue,
       row_number() OVER(PARTITION BY Member_id, activity_year_month 
                             ORDER BY activity_date) AS rown
  FROM dm.revenue_analysis
)
SELECT a.Member_id,
       a.activity_year_month,
       a.activity_date,
       SUM(b.revenue) AS RunningTotal
  FROM ranked_data a
 INNER JOIN ranked_data b
 WHERE b.member_id = a.member_id
   AND b.activity_year_month = a.activity_year_month
   AND b.rown <= a.rown
 GROUP BY a.Member_id, a.activity_year_month, a.activity_date
 ORDER BY a.Member_id, a.activity_year_month, a.rown

Open in new window

0
 
Chris LuttrellSenior Database ArchitectCommented:
just to be clear is your query to be run against Oracle or MS SQL and which version?  Can make a huge difference on syntax used.
0
 
lwadwellCommented:
Sorry - I made a silly mistake with both of my SQL's ... I had a WHERE when it should have been an ON.  I also needed to include the row_number() column in the group by of the second SQL.

These should work in either Oracle 9i (and above) and SQL Server 2005 (and above).
-- Suffers from duplicate activity_date issue
SELECT a.Member_id,
       a.activity_year_month,
       a.activity_date,
       SUM(b.revenue) AS RunningTotal
  FROM dm.revenue_analysis a
 INNER JOIN dm.revenue_analysis b
    ON b.member_id = a.member_id
   AND b.activity_year_month = a.activity_year_month
   AND b.activity_date <= a.activity_date
 GROUP BY a.Member_id, a.activity_year_month, a.activity_date
 ORDER BY a.Member_id, a.activity_year_month, a.activity_date
;
 
-- Should fix duplicate activity_date issue
WITH ranked_data as (
SELECT Member_id, activity_year_month, activity_date, revenue,
       row_number() OVER(PARTITION BY Member_id, activity_year_month 
                             ORDER BY activity_date) AS rown
  FROM dm.revenue_analysis
)
SELECT a.Member_id,
       a.activity_year_month,
       a.activity_date,
       SUM(b.revenue) AS RunningTotal
  FROM ranked_data a
 INNER JOIN ranked_data b
    ON b.member_id = a.member_id
   AND b.activity_year_month = a.activity_year_month
   AND b.rown <= a.rown
 GROUP BY a.Member_id, a.activity_year_month, a.rown, a.activity_date
 ORDER BY a.Member_id, a.activity_year_month, a.rown, a.activity_date
;

Open in new window

0
 
Funky_DBAAuthor Commented:
Hi Folks,
Thanks a lot for your help which was very useful.
I have split the points in a way that reflects the promptness and completeness of the first answer from  lwadwell and its corresponding value to me which was related to a possible job interview.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 6
  • 6
  • 5
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now