Link to home
Start Free TrialLog in
Avatar of NiceMan331
NiceMan331

asked on

sum over partition error

hi
i try to use this sql

select
   acc_no, cost_center, 
( credit - debit) cur_balance, 0 prev_balance,

DECODE (acc_no, 41010, (credit - debit)) cur_sal, 0 prev_sal,
                      DECODE (acc_no, 41010, 0, (debit - credit))
                         cur_all_exp, 0 prev_all_exp

 from alpha_tran where
period =  to_date('2015-06-30','yyyy-mm-dd')

Open in new window

it is ok
then i added summary like this

SELECT cost_center,acc_no,
  (sum (cur_sal) over (partition by cost_center)) sum_sal
                     
from
(
select
   acc_no, cost_center, 
( credit - debit) cur_balance, 0 prev_balance,

DECODE (acc_no, 41010, (credit - debit)) cur_sal, 0 prev_sal,
                      DECODE (acc_no, 41010, 0, (debit - credit))
                         cur_all_exp, 0 prev_all_exp

 from alpha_tran where
period =  to_date('2015-06-30','yyyy-mm-dd')
)
group by cost_center,acc_no

Open in new window

it goes to the line 10
(credit - debit))
not a group by expression
Avatar of Sean Stuber
Sean Stuber

remove the group by

if the produces duplicates, then use distinct



SELECT cost_center,acc_no,
  (sum (cur_sal) over (partition by cost_center)) sum_sal
                     
from
(
select
   acc_no, cost_center,
( credit - debit) cur_balance, 0 prev_balance,

DECODE (acc_no, 41010, (credit - debit)) cur_sal, 0 prev_sal,
                      DECODE (acc_no, 41010, 0, (debit - credit))
                         cur_all_exp, 0 prev_all_exp

 from alpha_tran where
period =  to_date('2015-06-30','yyyy-mm-dd')
)


or, you can get rid of the subquery since it's generating values you're not using
and just query the part you want directly


SELECT cost_center, acc_no, SUM(CASE WHEN acc_no = 41010 THEN credit - debit END) OVER (PARTITION BY cost_center)
  FROM alpha_tran
 WHERE period = TO_DATE('2015-06-30', 'yyyy-mm-dd')

again, add DISTINCT if necessary
SUM() without OVER() DOES require a group by clause.
In this form SUM() is an AGGREGATE FUNCTION
SELECT
       cost_center
     , acc_no
     , sum(cur_sal) sum_sal
FROM (
     SELECT
            acc_no
          , cost_center
          , (credit - debit) cur_balance
          , 0 prev_balance
          , DECODE(acc_no, 41010, (credit - debit)) cur_sal
          , 0 prev_sal
          , DECODE(acc_no, 41010, 0, (debit - credit)) cur_all_exp
          , 0 prev_all_exp
     FROM alpha_tran
     WHERE period = to_date('2015-06-30', 'yyyy-mm-dd')
     )
GROUP BY
       cost_center
     , acc_no;     

Open in new window

But when you use OVER(...) you do NOT require a GROUP BY.
In this form SUM() is an ANALYTIC FUNCTION
SELECT
       cost_center
     , acc_no
     , (sum(cur_sal) OVER (PARTITION BY cost_center)) sum_sal
FROM (
     SELECT
            acc_no
          , cost_center
          , (credit - debit) cur_balance
          , 0 prev_balance
          , DECODE(acc_no, 41010, (credit - debit)) cur_sal
          , 0 prev_sal
          , DECODE(acc_no, 41010, 0, (debit - credit)) cur_all_exp
          , 0 prev_all_exp
     FROM alpha_tran
     WHERE period = to_date('2015-06-30', 'yyyy-mm-dd')
     )

Open in new window

A fundamental difference between the two uses is that:
AGGREGATE functions WILL reduce the number of rows (due to the group by)
ANALYTIC functions WILL NOT REDUCE NUMBER OF ROWS (because there is no group by)

It isn't exactly clear why you have  several columns in the subquery that aren't referenced in the outer (top level) query, perhaps that was just an abbreviation you introduced for this question? If you really do not need all those columns then I too would recommend simplification. In addition personally I would avoid use of DECODE as I much prefer to use CASE EXPRESSIONS.

My guess at the overall query is
SELECT
       cost_center
     , sum(case when acc_no = 41010 then (credit - debit) else 0 end) sum_sal
     , sum(case when acc_no = 41010 then 0 else (credit - debit) end) sum_all_exp
     , sum(credit - debit) sum_balance
     /*
          are there really needed?
          , 0 prev_balance
          , 0 prev_sal
          , 0 prev_all_exp
     */
FROM alpha_tran
WHERE period = to_date('2015-06-30', 'yyyy-mm-dd')
GROUP BY
       cost_center

Open in new window

{+edit}BUT this guess removes the account number column, so I'm not confident about that guess.
Avatar of NiceMan331

ASKER

ok paul
what if i want to add the sum over partiotion to your last sql ?
i did like this , but i think it is complicated sql

select cost_center,acc_no,
cur_sal,cur_all_exp,prev_sal,prev_all_exp,
SUM (cur_balance) over (partition by cost_center) cur_Balance,
             SUM (prev_balance) over (partition by cost_center) prev_Balance
             from
(SELECT cost_center,acc_no

     , sum(case when acc_no = 41010 then (cur_balance) else 0 end) cur_sal
     , sum(case when acc_no = 41010 then 0 else (cur_balance) end) cur_all_exp,
      sum(case when acc_no = 41010 then (prev_balance) else 0 end) prev_sal
     , sum(case when acc_no = 41010 then 0 else (prev_balance) end) prev_all_exp
 ,sum(cur_balance) cur_balance      
         
,SUM (prev_balance) prev_Balance
               
      
from
(
select
   acc_no, cost_center, 
( credit - debit) cur_balance, 0 prev_balance

 from alpha_tran where
period =  to_date('2015-06-30','yyyy-mm-dd')
union all
select acc_no, 3100, 
( sum(credit) - sum(debit)) , 0 


 from alpha_tran where
period =  to_date('2015-06-30','yyyy-mm-dd')
group by acc_no
union all
select
   acc_no, cost_center, 
0 , ( credit - debit)  


 from alpha_tran where
period = to_date('2014-06-30','yyyy-mm-dd')
union all
select acc_no, 3100, 
0 , ( sum(credit) - sum(debit))  


 from alpha_tran where
period =  to_date('2014-06-30','yyyy-mm-dd')
group by acc_no

)
group by cost_center,acc_no)
order by cost_center;

Open in new window


in addition , if you not mind :
1- if i want to get variance by dividing for example cur_sal/cur_all_exp and cur_all-exp is zero ?
2- any adjustment to use date parameter instead of using it 4 times ?
BEFORE attempting to answer any "extras" PLEASE let us solve the original question first.

In my view all you have done is made the query much more complicated than it should be.

I IMPLORE (plead, beg) you to provide an EXAMPLE of the source data and then the expected result.

(sample data) + (expected result) = good question structure
ok
first the structur of the table
create table alpha_tran
cost_center number
acc_no number
debit number
credit number
period date
this table contain financial transactions of branches
each branch has trial balance consist of summary of amount of its acc_no
the purpose of the requested query is to compare performance of each branches
between 2 selected periods
nex , i will post sample of data
then the requested result
please find the attached Excell File
the sample is for 2 periods
31-mar-14   which we will consider it as prev_period
and
31-mar-15 which we will call it , cur_period
regarding sheet of result
1- each cost_center has different acc_no
2- sales acc_no = 41010
3- other acc_no else of sales = all_expense
4-column E : balance for each cost_center ( the P/L ) = (sales - all-expense)
5-Column F : TO CALCULATE expense percenta of sales of same cost_center ,
   ( note , if acc_no = 41010 , no percentage should be calculate )
6-column G , to calculate percentage of balance for each cost_center on its sales
7 - columns : h-i-j-k-l , SAME PREVIOUS CALCULATION but for the previous period
8- column M , sales variances between 2 periods
9-column n , expense variances between 2 periods
finally
the data sample taken for 2 cost_centers :  3107  and 3109
we need summary of all , under code name : 3100
hope my file is not complicated as same as my sql
Sample_1.xlsx
And the expected results for that data?
The attached file contain 2 sheets , data : for sample transaction , and result , for expected result
Sorry, I didn't notice the result tab.
Welcome , if you need any more details plz tell me , thank in advance
Your data is showing 3 credits of 288,403 for cost_center 3107, acc_no 41010, on 31-Mar-15
(a total of 865,209) So why do you only show 374, 960 in your expected results?
Also, some balances and percentages don't compute. perhaps you can provide a little more description of how the numbers should be derived.
Yes sorry , I'm wrong , the total should be as you calculate , it was just computing error because the data is sample and not real
sorry , i just be beside my pc
i reviewed the calculation of acc 41010
it was correct
total debit transaction for this acc_no = 490250
and total credit = 865210
so , the balance = 490250 - 865210 = - 374960
which is correct
ASKER CERTIFIED SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
great query
only small adjustment to the condition of where , it is now correct as the following

WITH CC_ACCNO
AS (
      SELECT
            1 QUERY_NUM
          , GROUPING_ID(COST_CENTER, ACC_NO) GROUPING_LEVEL

          , COST_CENTER
          , ACC_NO

            /* columns for "current period" calculations */
          , SUM(CASE
                  WHEN period = to_date('2014-09-30', 'yyyy-mm-dd') AND acc_no = 41010 THEN (credit - debit)
                  ELSE 0
            END) cur_41010
          , SUM(CASE
                  WHEN period = to_date('2014-09-30', 'yyyy-mm-dd') AND acc_no <> 41010  THEN 
                   (debit - credit) else 0
            END) cur_not_41010

            /* columns for "previous period" calculations */
          , SUM(CASE
                  WHEN period = to_date('2013-09-30', 'yyyy-mm-dd') AND acc_no = 41010 THEN (credit - debit)
                  ELSE 0
            END) prv_41010
          , SUM(CASE
                  WHEN period = to_date('2013-09-30', 'yyyy-mm-dd') AND acc_no <> 41010  THEN 
                  (debit - credit) else 0
            END) prv_not_41010
      FROM alpha_tran
      WHERE period = to_date('2014-09-30', 'yyyy-mm-dd')
            OR period = to_date('2013-09-30', 'yyyy-mm-dd')
      GROUP BY ROLLUP (COST_CENTER, ACC_NO)
  )
, ACCNO
AS (
      SELECT
            2 AS QUERY_NUM
          , 0 AS GROUPING_LEVEL
          , 3100 AS Cost_Center
          , ACC_NO
          , SUM(CUR_41010) AS CUR_41010
          , SUM(CUR_NOT_41010) AS CUR_NOT_41010
          , SUM(PRV_41010) AS PRV_41010
          , SUM(PRV_NOT_41010) AS PRV_NOT_41010
      FROM CC_ACCNO
      WHERE GROUPING_LEVEL = 0
      GROUP BY ACC_NO
)
SELECT
      *
FROM CC_ACCNO
WHERE GROUPING_LEVEL < 3

UNION ALL
      SELECT
            *
      FROM ACCNO
UNION ALL
      SELECT
            *
      FROM CC_ACCNO
      WHERE GROUPING_LEVEL = 3

Open in new window



it is really a wonderfull method
1- could the above query be as subquery , of a query to select for example
WHERE GROUPING_LEVEL = 1 ( which is sub totals of each cost_center )
2-if you never mind , could you please continue adding the factors
for example :
prv_not_41010 ( where group level = 0 ) / (prv_41010  where group_level = 1)
cur_not_41010 / cur_41010( where group level = 0 ) / (prv_41010  where group_level = 1)
(cur_41010-prv_41010)/(prv_41010)( where group level = 1)
(cur_not_41010-prv_not_41010)/(prv_not_41010)( where group level = 1)
now that we have used GROUP BY ROLLUP, and placed this result into a CTE, the other factors can use "correlated subqueries" when we need to cross over grouping levels for data

I repeat however that I will not complete the entire query, but will give you a sample to work with. In my view you need to attempt some of this yourself because you are the one who needs to understand it, and support it.

sample for prv_not_41010 ( where group level = 0 ) / (prv_41010  where group_level = 1)

| QUERY_NUM | GROUPING_LEVEL | COST_CENTER | ACC_NO | CUR_41010 | CUR_NOT_41010 | PRV_41010 | PRV_NOT_41010 |               FACTOR1 |
|-----------|----------------|-------------|--------|-----------|---------------|-----------|---------------|-----------------------|
|         1 |              0 |        3107 |  41010 |    374959 |        671506 |    296547 |        671506 |    1.0822483507786502 |
|         1 |              0 |        3107 |  43504 |         0 |        -23334 |         0 |        -35001 |   -0.0564101802896825 |
|         1 |              0 |        3107 |  43510 |         0 |      -3297.99 |         0 |      -4631.89 | -0.007465093853946387 |
|         1 |              0 |        3107 |  43513 |         0 |      -12556.8 |         0 |        -11400 | -0.018373076635021298 |
|         1 |              1 |        3107 | (null) |    374959 |     632317.21 |    296547 |     620473.11 |                (null) |
|         1 |              0 |        3109 |  41010 |    825654 |       1134503 |    308849 |       1134503 |     1.069629288399531 |
|         1 |              0 |        3109 |  43504 |         0 |        -36666 |         0 |        -54999 | -0.051854019982922744 |
|         1 |              0 |        3109 |  43510 |         0 |      -3350.17 |         0 |      -5653.35 | -0.005330077344505469 |
|         1 |              0 |        3109 |  43513 |         0 |     -13587.88 |         0 |        -13200 |  -0.01244519107210277 |
|         1 |              1 |        3109 | (null) |    825654 |    1080898.95 |    308849 |    1060650.65 |                (null) |
|         2 |              0 |        3100 |  41010 |   1200613 |       1806009 |    605396 |       1806009 |    1.0742867616123635 |
|         2 |              0 |        3100 |  43504 |         0 |        -60000 |         0 |        -90000 |   -0.0535356183413885 |
|         2 |              0 |        3100 |  43510 |         0 |      -6648.16 |         0 |     -10285.24 |  -0.00611807425766203 |
|         2 |              0 |        3100 |  43513 |         0 |     -26144.68 |         0 |        -24600 | -0.014633069013312856 |
|         1 |              3 |      (null) | (null) |   1200613 |    1713216.16 |    605396 |    1681123.76 |                (null) |

Open in new window

WITH CC_ACCNO
AS (
      SELECT
            1 QUERY_NUM
          , GROUPING_ID(COST_CENTER, ACC_NO) GROUPING_LEVEL

          , COST_CENTER
          , ACC_NO

            /* columns for "current period" calculations */
          , SUM(CASE
                  WHEN period = to_date('2015-03-31', 'yyyy-mm-dd') AND
                  acc_no = 41010 THEN (credit - debit)
                  ELSE 0
            END) cur_41010
          , SUM(CASE
                  WHEN period = to_date('2015-03-31', 'yyyy-mm-dd') AND
                  acc_no <> 41010 OR
                  acc_no IS NULL THEN 0
                  ELSE (credit - debit)
            END) cur_not_41010

            /* columns for "previous period" calculations */
          , SUM(CASE
                  WHEN period = to_date('2014-03-31', 'yyyy-mm-dd') AND acc_no = 41010 THEN (credit - debit)
                  ELSE 0
            END) prv_41010
          , SUM(CASE
                  WHEN period = to_date('2014-03-31', 'yyyy-mm-dd') AND acc_no <> 41010 OR  acc_no IS NULL THEN 0
                  ELSE (credit - debit)
            END) prv_not_41010
      FROM alpha_tran
      WHERE period = to_date('2014-03-31', 'yyyy-mm-dd')
            OR period = to_date('2015-03-31', 'yyyy-mm-dd')
      GROUP BY ROLLUP (COST_CENTER, ACC_NO)
  )
, ACCNO
AS (
      SELECT
            2 AS QUERY_NUM
          , 0 AS GROUPING_LEVEL
          , 3100 AS Cost_Center
          , ACC_NO
          , SUM(CUR_41010) AS CUR_41010
          , SUM(CUR_NOT_41010) AS CUR_NOT_41010
          , SUM(PRV_41010) AS PRV_41010
          , SUM(PRV_NOT_41010) AS PRV_NOT_41010
      FROM CC_ACCNO
      WHERE GROUPING_LEVEL = 0
      GROUP BY ACC_NO
)
SELECT
      QUERY_NUM
    , GROUPING_LEVEL
    , COST_CENTER
    , ACC_NO
    , CUR_41010
    , CUR_NOT_41010
    , PRV_41010
    , PRV_NOT_41010
    /* prv_not_41010 ( where group level = 0 ) / (prv_41010  where group_level = 1) */
    , CASE WHEN GROUPING_LEVEL = 0 THEN
        PRV_NOT_41010 / NULLIF( (select PRV_NOT_41010 
                                 from CC_ACCNO f 
                                 where f.COST_CENTER = CC_ACCNO.COST_CENTER 
                                 and f.grouping_level = 1
                                 and rownum = 1) ,0)
        ELSE NULL
       END AS factor1
FROM CC_ACCNO
WHERE GROUPING_LEVEL < 3
UNION ALL
      SELECT
              QUERY_NUM
            , GROUPING_LEVEL
            , COST_CENTER
            , ACC_NO
            , CUR_41010
            , CUR_NOT_41010
            , PRV_41010
            , PRV_NOT_41010
            /* prv_not_41010 ( where group level = 0 ) / (prv_41010  where group_level = 1) */
            , PRV_NOT_41010 / NULLIF( (select PRV_NOT_41010 
                                         from CC_ACCNO f 
                                         where f.grouping_level = 3
                                         and rownum = 1) ,0)
               AS factor1
      FROM ACCNO
UNION ALL
      SELECT
              QUERY_NUM
            , GROUPING_LEVEL
            , COST_CENTER
            , ACC_NO
            , CUR_41010
            , CUR_NOT_41010
            , PRV_41010
            , PRV_NOT_41010
            , NULL
      FROM CC_ACCNO
      WHERE GROUPING_LEVEL = 3
;

Open in new window

also see:  http://sqlfiddle.com/#!4/ddf2c/26
ok
i got the idea
i will complete all required factors
remaining this question plz
1 - could the above query be as sub query , of a query to select for example
WHERE GROUPING_LEVEL = 1 ( which is sub totals of each cost_center )

2- if you never mind , you advised me to stop using sum over partition
that function i been advised by another expert here to use it in different question but for same purpose to got summary of one column in the main query
ok , i know every expert has different method , ok , but the question is : why you advised me to stop using it ? and where the place to use it if it is not suitable here
thanx alot , it is really great job from your side
use analytics (i.e. the over() method)   when you want to perform an operation on a range of rows while still maintaining the values of each row

use aggregates (i.e. the group by method) when you want to combine values from multiple rows into a single row.
So if for example I have my data for one period only and no comparison required , and I need to divide total expense over total sales per branch , it will be better to use partition method ?
yes, because each row (branch) is maintained as its own entity while also needing the value over the entire range of data (total expense)

if you used group by you could generate a rollup of the entire expense but that would show up as a separate row which wouldn't be useful for doing the division.
Ok , thanx , pending only what if I would like to select data of specific grouping level of query done by Paul , I mean may get records for total per branches or and total all branches only without the level 0
I advised not to use analytic functions FOR ONLY THIS QUERY because they are not required.

I have DEMONSTRATED how to achieve the expected result and I didn't need analytic functions to do it.

I love analytic functions!, they are just not helpful for this question.
User generated imageThe expected result contains subtotals and a grand-total (see rows 8 13 and 19). This requirement is best met by using GROUP BY ROLLUP

When one uses GROUP BY ROLLUP the output has specific rows that contain the subtotals and grand-total. If you then ALSO use analytic functions on that output there is a risk of deriving the wrong figures (eg. by double counting). In my view using GROUP BY ROLLUP is somewhat incompatible with analytic functions (although not impossible, just rather awkward).

Additionally as I stated earlier there are really 2 types of query inside that expected result that I referred to as (a) & (b) . Trying to do those 2 queries and analytic functions on top of both just seems way too complex. It is far easier I believe to use a few correlated subqueries to finish of the cross-row calculations. Note because the output of GROUP BY ROLLUP is also very small in row numbers the correlated subqueries are unlikely to cause performance issues here (i.e. in this query!).

Hopefully I have now fully explained why I chose the method I did.
Yes it is understood , please last question , the query now is very helpful , but I tried to have both results , one like as it is to get details , and another result to have total branches only , when I change the creteria in the query , last end , to select where grouping level = 1 , instead of < 3 , in purpose to get only sub totals for each branch , it return error ( I forget the error name cause now I'm out of the office ) ,
When you are back in the office provide the query variant and the error please.
Ok , tomorrow , thanx a lot
sorry paul
i had no connection over 3 days
it is ok now
i got every think correct
thanx very much
regards
very nice sql
Thanks.