• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 73788
  • Last Modified:

ORACLE SQL - please explain OVER PARTITION BY

What does the OVER PARTITION BY do? I don't understand this SELECT statement and the OVER PARTITION BY.


SELECT                          
    Field1, field2,
    100 * field3/
        SUM(field3)
OVER ( PARTITION BY NULL) field4
FROM
0
joekeri
Asked:
joekeri
  • 3
  • 2
  • 2
1 Solution
 
paquicubaCommented:
In the above case PARTITION is useless ( SUM(field3) OVER () field4 --The same) . Field4 is getting populated with the total sum of field3 all way across, see the following example:

SCOTT@PROD > select empno, ename, sum(sal) over ( partition by null) as total_sal from emp;

     EMPNO ENAME       TOTAL_SAL
---------- ---------- ----------
      7369 SMITH           29025
      7499 ALLEN           29025
      7521 WARD            29025
      7566 JONES           29025
      7654 MARTIN          29025
      7698 BLAKE           29025
      7934 MILLER          29025
      7788 SCOTT           29025
      7839 KING            29025
      7844 TURNER          29025
      7876 ADAMS           29025
      7900 JAMES           29025
      7902 FORD            29025
      7782 CLARK           29025

14 rows selected.

Elapsed: 00:00:00.04
SCOTT@PROD > select empno, ename, sum(sal) over () as total_sal from emp;          

     EMPNO ENAME       TOTAL_SAL
---------- ---------- ----------
      7369 SMITH           29025
      7499 ALLEN           29025
      7521 WARD            29025
      7566 JONES           29025
      7654 MARTIN          29025
      7698 BLAKE           29025
      7782 CLARK           29025
      7788 SCOTT           29025
      7839 KING            29025
      7844 TURNER          29025
      7876 ADAMS           29025
      7900 JAMES           29025
      7902 FORD            29025
      7934 MILLER          29025

14 rows selected.
0
 
paquicubaCommented:
Now, if I want to create windows and run totals for different jobs I would PARTITION BY job:

SCOTT@PROD > select job, empno, ename, sum(sal) over (partition by job) as total_sal from emp;

JOB            EMPNO ENAME       TOTAL_SAL
--------- ---------- ---------- ----------
ANALYST         7788 SCOTT            6000
ANALYST         7902 FORD             6000
CLERK           7934 MILLER           4150
CLERK           7900 JAMES            4150
CLERK           7369 SMITH            4150
CLERK           7876 ADAMS            4150
MANAGER         7698 BLAKE            8275
MANAGER         7566 JONES            8275
MANAGER         7782 CLARK            8275
PRESIDENT       7839 KING             5000
SALESMAN        7844 TURNER           5600
SALESMAN        7654 MARTIN           5600
SALESMAN        7521 WARD             5600
SALESMAN        7499 ALLEN            5600

14 rows selected.

Elapsed: 00:00:00.03
0
 
joekeriAuthor Commented:
So,, from what you are saying is that OVER PARTITION BY I get that it is similar to GROUP BY... Is that correct?
0
Never miss a deadline with monday.com

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

 
paquicubaCommented:
Kind of, but No.

In the example below, when you group by JOB, you have to limit the number of columns and restrict the number of rows displayed in order to obtain the TOTAL SAL for the different jobs:

SCOTT@PROD > select job, sum(sal) as total_sal from emp group by job order by 1;

JOB        TOTAL_SAL
--------- ----------
ANALYST         6000
CLERK           4150
MANAGER         8275
PRESIDENT       5000
SALESMAN        5600


On the other hand, using partition allows me to display all columns and rows and still obtain a TOTAL SAL:

SCOTT@PROD > select job, empno, ename, sum(sal) over (partition by job) as total_sal from emp;

JOB            EMPNO ENAME       TOTAL_SAL
--------- ---------- ---------- ----------
ANALYST         7788 SCOTT            6000
ANALYST         7902 FORD             6000
CLERK           7934 MILLER           4150
CLERK           7900 JAMES            4150
CLERK           7369 SMITH            4150
CLERK           7876 ADAMS            4150
MANAGER         7698 BLAKE            8275
MANAGER         7566 JONES            8275
MANAGER         7782 CLARK            8275
PRESIDENT       7839 KING             5000
SALESMAN        7844 TURNER           5600
SALESMAN        7654 MARTIN           5600
SALESMAN        7521 WARD             5600
SALESMAN        7499 ALLEN            5600


I can add as many columns I want without affecting the result:

SCOTT@PROD > select job, empno, ename, sum(sal) over (partition by job) as total_sal, deptno from emp;

JOB            EMPNO ENAME       TOTAL_SAL     DEPTNO
--------- ---------- ---------- ---------- ----------
ANALYST         7788 SCOTT            6000         20
ANALYST         7902 FORD             6000         20
CLERK           7934 MILLER           4150         10
CLERK           7900 JAMES            4150         30
CLERK           7369 SMITH            4150         20
CLERK           7876 ADAMS            4150         20
MANAGER         7698 BLAKE            8275         30
MANAGER         7566 JONES            8275         20
MANAGER         7782 CLARK            8275         10
PRESIDENT       7839 KING             5000         10
SALESMAN        7844 TURNER           5600         30
SALESMAN        7654 MARTIN           5600         30
SALESMAN        7521 WARD             5600         30
SALESMAN        7499 ALLEN            5600         30

14 rows selected.

Elapsed: 00:00:00.00
0
 
RCorfmanCommented:
This is the syntax for Oracle Analytics.

Basically, a normal query is run and the results are retrieves intanally by the Database Engine, then the Analytics are applied to the results set and the Analytic function columns are computed.
There are several functions, Sum, Min, Max, Rank, Dense_rank, count, etc.

They sound similar to the agregate functions, but agregate function are either applied to every record in the results (without group by clause), or to groups or records. In eihter case, they reduce the number or rows returned with a normal aggregate function.

With the analytic functions, the number of rows returned is not reduced.  You can tell is is an analytic function, not a group by function, by the OVER keyword.  Over is followed by a 'windowing clause'. This is what is included inside the ( ).  The 'partition by' portion of a windowing clause works similar to the group by, it determines what portion of the result set to apply each analytic function to.  For some analytic function, the windowing clause must have an order by also.

Here are a couple examples:
SQL> select * from udttest;

IP           DEST             LINENO
------------ ------------ ----------
AAA          BBB                   1
AAA          CCC                   2
AAA          DDD                   3
AAA          EEE                   4
AAA          DDD                  -1
AAA          DDD                  -3
AAA          HHH

7 rows selected.

SQL> -- normal aggregate function
SQL> select DEST,count(*) from udttest group by DEST;

DEST           COUNT(*)
------------ ----------
BBB                   1
CCC                   1
DDD                   3
EEE                   1
HHH                   1

SQL> --- analytic count function - notice all rows are returned still
SQL> select DEST,count(*) over (partition by DEST) from udttest;

DEST         COUNT(*)OVER(PARTITIONBYDEST)
------------ -----------------------------
BBB                                      1
CCC                                      1
DDD                                      3
DDD                                      3
DDD                                      3
EEE                                      1
HHH                                      1

7 rows selected.

SQL> -- analytic sum function
SQL> select ip,dest,lineno,sum(lineno) over (partition by dest) sum_line,
  2                        sum(lineno) over (partition by ip) sum_ip
  3    from udttest;

IP           DEST             LINENO   SUM_LINE     SUM_IP
------------ ------------ ---------- ---------- ----------
AAA          BBB                   1          1          6
AAA          CCC                   2          2          6
AAA          DDD                   3         -1          6
AAA          DDD                  -3         -1          6
AAA          DDD                  -1         -1          6
AAA          EEE                   4          4          6
AAA          HHH                                         6

7 rows selected.

SQL> -- another example is using rank
SQL> select ip, dest,lineno,
  2     rank() over (partition by dest order by lineno) dest_rank_line
  3*   from udttest;

IP           DEST             LINENO DEST_RANK_LINE
------------ ------------ ---------- --------------
AAA          BBB                   1              1
AAA          CCC                   2              1
AAA          DDD                  -3              1
AAA          DDD                  -1              2
AAA          DDD                   3              3
AAA          EEE                   4              1
AAA          HHH                                  1

7 rows selected.

SQL> -- and the same, but we will order by lineno...
SQL> ---   notice the column values don't change, just the order as expected
SQL> select ip, dest,lineno,
  2     rank() over (partition by dest order by lineno) dest_rank_line
  3    from udttest order by lineno;

IP           DEST             LINENO DEST_RANK_LINE
------------ ------------ ---------- --------------
AAA          DDD                  -3              1
AAA          DDD                  -1              2
AAA          BBB                   1              1
AAA          CCC                   2              1
AAA          DDD                   3              3
AAA          EEE                   4              1
AAA          HHH                                  1

7 rows selected.

SQL>
SQL> -- this can be good for 'top N' queries
SQL> -- For instance, to get the top 3 records by lineno, we use a nested query
SQL> select * from (
  2    select ip,dest,lineno,
  3       rank() over (partition by ip order by lineno desc nulls last) rank
  4      from udttest
  5*  ) where rank <= 3;

IP           DEST             LINENO       RANK
------------ ------------ ---------- ----------
AAA          EEE                   4          1
AAA          DDD                   3          2
AAA          CCC                   2          3

SQL>
0
 
joekeriAuthor Commented:
thanks for the information. it clarified it for me...
0
 
RCorfmanCommented:
paquicuba, sorry, we cross-posted to some extent. I had typed my explanation and was running scripts to show the example. I didn't see that you'd already covered some of what I did by the time I actually posted...
0

Featured Post

[Webinar] Kill tickets & tabs using PowerShell

Are you tired of cycling through the same browser tabs everyday to close the same repetitive tickets? In this webinar JumpCloud will show how you can leverage RESTful APIs to build your own PowerShell modules to kill tickets & tabs using the PowerShell command Invoke-RestMethod.

  • 3
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now