DENSE_RANK and KEEP

shacho
shacho used Ask the Experts™
on
I apologize for the newbishness of this question - it's been a long time since I used SQL.

I am picking apart some SQL and trying to understand its logic.  The core of the SQL is essentially this:

SELECT
  PROD_ID
  MAX(PROD_SIZE) KEEP (DENSE_RANK FIRST ORDER BY STORE_ID) AS PROD_SIZE
FROM PRODUCTS
GROUP BY
  PROD_ID

I need to understand in plain English what's happening here.  I can't get my mind around what KEEP does, and how and why DENSE_RANK is here.  Can someone help me work through this?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Steve WalesSenior Database Administrator

Commented:
RANK (http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions141.htm#SQLRF00690) and DENSE RANK (http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions052.htm#SQLRF00633) are analytic functions that show you the rank of a value within a list of values.

See the examples in the doco.

For RANK, if you had 5 people with salaries of 4200, 4800, 4800, 6000 & 9000 they would be ranked as so:

RANK: 1,2,2,4,5
DENSE RANK: 1,2,2,3,4

DENSE_RANK returns the row's rank within an ordered group.  If there are items with the same value, then no numbers are skipped in the ranking.  RANK does the same thing but will skip rank values in the event of ties within the list of values.

KEEP FIRST (http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions065.htm#SQLRF00641)  will return the FIRST (or LAST with KEEP LAST) row of a sorted group of rows.

So take a query like this:

Select deptid, min(hiredate) earliest, max(hiredate) latest,
   Count(salary) keep (dense_rank first order by hiredate) first,
   Count(salary) keep (dense_rank last order by hiredate) last
From employees
Group by deptid

Returns for each department, earliest and most recent hire date and the number of people hired on each of those dates

So, your query looks like it's going to return, by product, the largest prod_size for each store id.  (Not knowing your data, I'm not 100% sure what that means, but hopefully it means something for you).
awking00Information Technology Specialist
Commented:
>>So, your query looks like it's going to return, by product, the largest prod_size for each store id. <<
Actually, because of the DENSE_RANK FIRST ORDER BY STORE_ID statement, I think it's going to return, by product, the largest prod_size for the store that has the lowest store_id. Why that's wanted, I don't know.

Author

Commented:
> I think it's going to return, by product, the largest prod_size for the store that has the lowest store_id

This kind of makes sense.  As I understand it Store IDs are numeric, and the Store ID this guy would want to be working with is in fact the lowest.

Steve - is that your assessment as well?  You indicated that it would return data for all the stores, but would you agree that actually it will only return data for one store?
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Steve WalesSenior Database Administrator

Commented:
Yes, thinking about it, I believe awking00 is correct.

Author

Commented:
OK - I guess I understand.

Assuming the smallest STORE_ID is 1, would this query deliver different the same output?

SELECT
  PROD_ID,
  MAX(PROD_SIZE) AS PROD_SIZE
FROM PRODUCTS
GROUP BY  PROD_ID
WHERE STORE_ID = 1

If so, why do it the other way?  I am certain the author knows that STORE_ID is 1.
Steve WalesSenior Database Administrator

Commented:
The other way gives you the RANK of the values as well.

What you propose there just gives the PROD_ID and the maximum size, and doesn't show the rank

Run the two queries against your database - the difference in output should be pretty immediately obvious :)

Author

Commented:
The original SQL also does not include rank as an output field.  I don't see the difference.  The only difference I can imagine is that using keep dense_rank will also yield some output fro records there there are products that don't exist in the store whose ID is one.

So maybe something like this?  Is this correct?

SIZE	STORE	ID
1	1	10
2	1	10
1	2	10
3	1	20
7	2	20
4	1	30
4	3	40
		
SELECT		
 ID, STORE,		
 MAX(SIZE) KEEP (DENSE_RANK FIRST ORDER BY STORE) AS FRUIT_SIZE	
FROM FRUIT		
GROUP BY ID		
		
ID	STORE	FRUIT_SIZE
10	1	2
20	1	3
30	1	4
40	3	4

SELECT		
 ID, STORE, MAX(SIZE) AS FRUIT_SIZE	
FROM FRUIT	
WHERE STORE = 1
GROUP BY ID		
		
ID	STORE	FRUIT_SIZE
10	1	2
20	1	3
30	1	4

Open in new window

Steve WalesSenior Database Administrator

Commented:
Oops!  You're right.

The rank does an internal sort and keeps the first row it finds.

It also shows all stores where as your query shows the first store.

It's a bit of an odd query all things considered.

Going back to the documentation here: http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions065.htm#SQLRF51412

The KEEP example in their query (shown here):

SELECT department_id,
       MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY commission_pct) "Worst",
       MAX(salary) KEEP (DENSE_RANK LAST ORDER BY commission_pct) "Best"
  FROM employees
  GROUP BY department_id
  ORDER BY department_id;

Says:  returns, within each department of the sample table hr.employees, the minimum salary among the employees who make the lowest commission and the maximum salary among the employees who make the highest commission:

So applying that same logic, it appears that the rank query you have is returning, within each store, the max size among those fruits in the lowest numbered stores.

That makes very little sense to me, quite frankly - but again, I don't understand your data like you hopefully do.

Author

Commented:
Ha ha - I wish I understood the data...  This is forensic work.

I don't have access to the original data, nor do I have access to an SQL DB to test this.  I'm sure it can be done with SQLFiddle, but I also don't know the syntax for creating a schema.  Can somebody test the queries I just created to see if the results are what I'm expecting (see my last post)?

Mike
Steve WalesSenior Database Administrator

Commented:
Willing to try but can you give a more complete dataset of all the rows, rather than just the result set ?  Or at least a sample of the data and the results you get from it?

Author

Commented:
It's all above in the code snippet.  The original table and my predictions for the output for two queries, as well as the queries.
Steve WalesSenior Database Administrator

Commented:
So it is.  Sorry, missed that.  Will have a look tomorrow, getting late.
Steve WalesSenior Database Administrator

Commented:
OK so decided to look tonight anyway

Setup (Size is a reserved word, didn't want to deal with it, so changed it to XSIZE for testing):

SQL> create table fruit (xsize integer, store integer, id integer);

Table created.

SQL> insert into fruit values (1,1,10);

1 row created.

SQL> insert into fruit values (2,1,10);

1 row created.

SQL> insert into fruit values (1,2,10);

1 row created.

SQL> insert into fruit values (3,1,20);

1 row created.

SQL> insert into fruit values (7,2,20);

1 row created.

SQL> insert into fruit values (4,1,30);

1 row created.

SQL> insert into fruit values (4,3,40);

1 row created.

SQL> commit;

Commit complete.

Open in new window


Checking the data is all there:

SQL> select id, store, xsize from fruit order by store, id, xsize;

        ID      STORE      XSIZE
---------- ---------- ----------
        10          1          1
        10          1          2
        20          1          3
        30          1          4
        10          2          1
        20          2          7
        40          3          4

7 rows selected.

Open in new window


Original Query:

SQL> SELECT
  2  ID, STORE,
  3  MAX(XSIZE) KEEP (DENSE_RANK FIRST ORDER BY STORE) AS FRUIT_SIZE
  4  FROM FRUIT
  5  GROUP BY ID, STORE
  6  ;

        ID      STORE FRUIT_SIZE
---------- ---------- ----------
        10          1          2
        10          2          1
        20          1          3
        20          2          7
        30          1          4
        40          3          4

6 rows selected.

Open in new window


This appears to be selecting the MAX FRUIT_SIZE for each combination of ID & Store

Ran your second query without the WHERE:

  1  SELECT
  2   ID, STORE, MAX(XSIZE) AS FRUIT_SIZE
  3  FROM FRUIT
  4  GROUP BY ID, STORE
  5* order by 1,2
SQL> /

        ID      STORE FRUIT_SIZE
---------- ---------- ----------
        10          1          2
        10          2          1
        20          1          3
        20          2          7
        30          1          4
        40          3          4

6 rows selected.

Open in new window


Same result set.

It would appear that for this particular sample of data, the second query does the same thing.

I still don't understand what this is attempting to give you other than the larges fruit size per ID / Store - and if that's really the case, the second query seems a whole lot easier to grasp to me.

Author

Commented:
Thanks for that!  OK - I'm still in the woods.  Gonna research a little more and pick this up tomorrow.

Mike

Author

Commented:
Are there any good resources to look to to understand how Oracle interprets then executes SQL?  I just can't get my mind around the sequence of events that take place in the execution of this statement.
Senior Database Administrator
Commented:
Have you looked at the execution plan ?

You can either, from sqlplus, SET AUTOTRACE ON to have it generated then, or put EXPLAIN PLAN FOR in front of your query and then select from PLAN_TABLE to see it.

The former is probably easier - since it will execute and then show you the plan of what it did.

Author

Commented:
Great!  That should do it!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial