# DENSE_RANK and KEEP

I apologize for the newbishness of this question - it's been a long time since I used SQL.

I am picking apart some SQL and trying to understand its logic.  The core of the SQL is essentially this:

SELECT
PROD_ID
MAX(PROD_SIZE) KEEP (DENSE_RANK FIRST ORDER BY STORE_ID) AS PROD_SIZE
FROM PRODUCTS
GROUP BY
PROD_ID

I need to understand in plain English what's happening here.  I can't get my mind around what KEEP does, and how and why DENSE_RANK is here.  Can someone help me work through this?
Steve Wales

RANK (http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions141.htm#SQLRF00690) and DENSE RANK (http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions052.htm#SQLRF00633) are analytic functions that show you the rank of a value within a list of values.

See the examples in the doco.

For RANK, if you had 5 people with salaries of 4200, 4800, 4800, 6000 & 9000 they would be ranked as so:

RANK: 1,2,2,4,5
DENSE RANK: 1,2,2,3,4

DENSE_RANK returns the row's rank within an ordered group.  If there are items with the same value, then no numbers are skipped in the ranking.  RANK does the same thing but will skip rank values in the event of ties within the list of values.

KEEP FIRST (http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions065.htm#SQLRF00641)  will return the FIRST (or LAST with KEEP LAST) row of a sorted group of rows.

So take a query like this:

Select deptid, min(hiredate) earliest, max(hiredate) latest,
Count(salary) keep (dense_rank first order by hiredate) first,
Count(salary) keep (dense_rank last order by hiredate) last
From employees
Group by deptid

Returns for each department, earliest and most recent hire date and the number of people hired on each of those dates

So, your query looks like it's going to return, by product, the largest prod_size for each store id.  (Not knowing your data, I'm not 100% sure what that means, but hopefully it means something for you).
SOLUTION
awking00

membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
shacho

> I think it's going to return, by product, the largest prod_size for the store that has the lowest store_id

This kind of makes sense.  As I understand it Store IDs are numeric, and the Store ID this guy would want to be working with is in fact the lowest.

Steve - is that your assessment as well?  You indicated that it would return data for all the stores, but would you agree that actually it will only return data for one store?
Yes, thinking about it, I believe awking00 is correct.

OK - I guess I understand.

Assuming the smallest STORE_ID is 1, would this query deliver different the same output?

SELECT
PROD_ID,
MAX(PROD_SIZE) AS PROD_SIZE
FROM PRODUCTS
GROUP BY  PROD_ID
WHERE STORE_ID = 1

If so, why do it the other way?  I am certain the author knows that STORE_ID is 1.
The other way gives you the RANK of the values as well.

What you propose there just gives the PROD_ID and the maximum size, and doesn't show the rank

Run the two queries against your database - the difference in output should be pretty immediately obvious :)

The original SQL also does not include rank as an output field.  I don't see the difference.  The only difference I can imagine is that using keep dense_rank will also yield some output fro records there there are products that don't exist in the store whose ID is one.

So maybe something like this?  Is this correct?

``````SIZE	STORE	ID
1	1	10
2	1	10
1	2	10
3	1	20
7	2	20
4	1	30
4	3	40

SELECT
ID, STORE,
MAX(SIZE) KEEP (DENSE_RANK FIRST ORDER BY STORE) AS FRUIT_SIZE
FROM FRUIT
GROUP BY ID

ID	STORE	FRUIT_SIZE
10	1	2
20	1	3
30	1	4
40	3	4

SELECT
ID, STORE, MAX(SIZE) AS FRUIT_SIZE
FROM FRUIT
WHERE STORE = 1
GROUP BY ID

ID	STORE	FRUIT_SIZE
10	1	2
20	1	3
30	1	4``````
Oops!  You're right.

The rank does an internal sort and keeps the first row it finds.

It also shows all stores where as your query shows the first store.

It's a bit of an odd query all things considered.

Going back to the documentation here: http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions065.htm#SQLRF51412

The KEEP example in their query (shown here):

SELECT department_id,
MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY commission_pct) "Worst",
MAX(salary) KEEP (DENSE_RANK LAST ORDER BY commission_pct) "Best"
FROM employees
GROUP BY department_id
ORDER BY department_id;

Says:  returns, within each department of the sample table hr.employees, the minimum salary among the employees who make the lowest commission and the maximum salary among the employees who make the highest commission:

So applying that same logic, it appears that the rank query you have is returning, within each store, the max size among those fruits in the lowest numbered stores.

That makes very little sense to me, quite frankly - but again, I don't understand your data like you hopefully do.

Ha ha - I wish I understood the data...  This is forensic work.

I don't have access to the original data, nor do I have access to an SQL DB to test this.  I'm sure it can be done with SQLFiddle, but I also don't know the syntax for creating a schema.  Can somebody test the queries I just created to see if the results are what I'm expecting (see my last post)?

Mike
Willing to try but can you give a more complete dataset of all the rows, rather than just the result set ?  Or at least a sample of the data and the results you get from it?

It's all above in the code snippet.  The original table and my predictions for the output for two queries, as well as the queries.
So it is.  Sorry, missed that.  Will have a look tomorrow, getting late.
OK so decided to look tonight anyway

Setup (Size is a reserved word, didn't want to deal with it, so changed it to XSIZE for testing):

``````SQL> create table fruit (xsize integer, store integer, id integer);

Table created.

SQL> insert into fruit values (1,1,10);

1 row created.

SQL> insert into fruit values (2,1,10);

1 row created.

SQL> insert into fruit values (1,2,10);

1 row created.

SQL> insert into fruit values (3,1,20);

1 row created.

SQL> insert into fruit values (7,2,20);

1 row created.

SQL> insert into fruit values (4,1,30);

1 row created.

SQL> insert into fruit values (4,3,40);

1 row created.

SQL> commit;

Commit complete.``````

Checking the data is all there:

``````SQL> select id, store, xsize from fruit order by store, id, xsize;

ID      STORE      XSIZE
---------- ---------- ----------
10          1          1
10          1          2
20          1          3
30          1          4
10          2          1
20          2          7
40          3          4

7 rows selected.``````

Original Query:

``````SQL> SELECT
2  ID, STORE,
3  MAX(XSIZE) KEEP (DENSE_RANK FIRST ORDER BY STORE) AS FRUIT_SIZE
4  FROM FRUIT
5  GROUP BY ID, STORE
6  ;

ID      STORE FRUIT_SIZE
---------- ---------- ----------
10          1          2
10          2          1
20          1          3
20          2          7
30          1          4
40          3          4

6 rows selected.``````

This appears to be selecting the MAX FRUIT_SIZE for each combination of ID & Store

Ran your second query without the WHERE:

``````  1  SELECT
2   ID, STORE, MAX(XSIZE) AS FRUIT_SIZE
3  FROM FRUIT
4  GROUP BY ID, STORE
5* order by 1,2
SQL> /

ID      STORE FRUIT_SIZE
---------- ---------- ----------
10          1          2
10          2          1
20          1          3
20          2          7
30          1          4
40          3          4

6 rows selected.``````

Same result set.

It would appear that for this particular sample of data, the second query does the same thing.

I still don't understand what this is attempting to give you other than the larges fruit size per ID / Store - and if that's really the case, the second query seems a whole lot easier to grasp to me.

Thanks for that!  OK - I'm still in the woods.  Gonna research a little more and pick this up tomorrow.

Mike