Avatar of shacho
shacho
 asked on

DENSE_RANK and KEEP

I apologize for the newbishness of this question - it's been a long time since I used SQL.

I am picking apart some SQL and trying to understand its logic.  The core of the SQL is essentially this:

SELECT
  PROD_ID
  MAX(PROD_SIZE) KEEP (DENSE_RANK FIRST ORDER BY STORE_ID) AS PROD_SIZE
FROM PRODUCTS
GROUP BY
  PROD_ID

I need to understand in plain English what's happening here.  I can't get my mind around what KEEP does, and how and why DENSE_RANK is here.  Can someone help me work through this?
Oracle DatabaseSQL

Avatar of undefined
Last Comment
shacho

8/22/2022 - Mon
Steve Wales

RANK (http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions141.htm#SQLRF00690) and DENSE RANK (http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions052.htm#SQLRF00633) are analytic functions that show you the rank of a value within a list of values.

See the examples in the doco.

For RANK, if you had 5 people with salaries of 4200, 4800, 4800, 6000 & 9000 they would be ranked as so:

RANK: 1,2,2,4,5
DENSE RANK: 1,2,2,3,4

DENSE_RANK returns the row's rank within an ordered group.  If there are items with the same value, then no numbers are skipped in the ranking.  RANK does the same thing but will skip rank values in the event of ties within the list of values.

KEEP FIRST (http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions065.htm#SQLRF00641)  will return the FIRST (or LAST with KEEP LAST) row of a sorted group of rows.

So take a query like this:

Select deptid, min(hiredate) earliest, max(hiredate) latest,
   Count(salary) keep (dense_rank first order by hiredate) first,
   Count(salary) keep (dense_rank last order by hiredate) last
From employees
Group by deptid

Returns for each department, earliest and most recent hire date and the number of people hired on each of those dates

So, your query looks like it's going to return, by product, the largest prod_size for each store id.  (Not knowing your data, I'm not 100% sure what that means, but hopefully it means something for you).
SOLUTION
awking00

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
shacho

ASKER
> I think it's going to return, by product, the largest prod_size for the store that has the lowest store_id

This kind of makes sense.  As I understand it Store IDs are numeric, and the Store ID this guy would want to be working with is in fact the lowest.

Steve - is that your assessment as well?  You indicated that it would return data for all the stores, but would you agree that actually it will only return data for one store?
Steve Wales

Yes, thinking about it, I believe awking00 is correct.
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
shacho

ASKER
OK - I guess I understand.

Assuming the smallest STORE_ID is 1, would this query deliver different the same output?

SELECT
  PROD_ID,
  MAX(PROD_SIZE) AS PROD_SIZE
FROM PRODUCTS
GROUP BY  PROD_ID
WHERE STORE_ID = 1

If so, why do it the other way?  I am certain the author knows that STORE_ID is 1.
Steve Wales

The other way gives you the RANK of the values as well.

What you propose there just gives the PROD_ID and the maximum size, and doesn't show the rank

Run the two queries against your database - the difference in output should be pretty immediately obvious :)
shacho

ASKER
The original SQL also does not include rank as an output field.  I don't see the difference.  The only difference I can imagine is that using keep dense_rank will also yield some output fro records there there are products that don't exist in the store whose ID is one.

So maybe something like this?  Is this correct?

SIZE	STORE	ID
1	1	10
2	1	10
1	2	10
3	1	20
7	2	20
4	1	30
4	3	40
		
SELECT		
 ID, STORE,		
 MAX(SIZE) KEEP (DENSE_RANK FIRST ORDER BY STORE) AS FRUIT_SIZE	
FROM FRUIT		
GROUP BY ID		
		
ID	STORE	FRUIT_SIZE
10	1	2
20	1	3
30	1	4
40	3	4

SELECT		
 ID, STORE, MAX(SIZE) AS FRUIT_SIZE	
FROM FRUIT	
WHERE STORE = 1
GROUP BY ID		
		
ID	STORE	FRUIT_SIZE
10	1	2
20	1	3
30	1	4

Open in new window

Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
Steve Wales

Oops!  You're right.

The rank does an internal sort and keeps the first row it finds.

It also shows all stores where as your query shows the first store.

It's a bit of an odd query all things considered.

Going back to the documentation here: http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions065.htm#SQLRF51412

The KEEP example in their query (shown here):

SELECT department_id,
       MIN(salary) KEEP (DENSE_RANK FIRST ORDER BY commission_pct) "Worst",
       MAX(salary) KEEP (DENSE_RANK LAST ORDER BY commission_pct) "Best"
  FROM employees
  GROUP BY department_id
  ORDER BY department_id;

Says:  returns, within each department of the sample table hr.employees, the minimum salary among the employees who make the lowest commission and the maximum salary among the employees who make the highest commission:

So applying that same logic, it appears that the rank query you have is returning, within each store, the max size among those fruits in the lowest numbered stores.

That makes very little sense to me, quite frankly - but again, I don't understand your data like you hopefully do.
shacho

ASKER
Ha ha - I wish I understood the data...  This is forensic work.

I don't have access to the original data, nor do I have access to an SQL DB to test this.  I'm sure it can be done with SQLFiddle, but I also don't know the syntax for creating a schema.  Can somebody test the queries I just created to see if the results are what I'm expecting (see my last post)?

Mike
Steve Wales

Willing to try but can you give a more complete dataset of all the rows, rather than just the result set ?  Or at least a sample of the data and the results you get from it?
Your help has saved me hundreds of hours of internet surfing.
fblack61
shacho

ASKER
It's all above in the code snippet.  The original table and my predictions for the output for two queries, as well as the queries.
Steve Wales

So it is.  Sorry, missed that.  Will have a look tomorrow, getting late.
Steve Wales

OK so decided to look tonight anyway

Setup (Size is a reserved word, didn't want to deal with it, so changed it to XSIZE for testing):

SQL> create table fruit (xsize integer, store integer, id integer);

Table created.

SQL> insert into fruit values (1,1,10);

1 row created.

SQL> insert into fruit values (2,1,10);

1 row created.

SQL> insert into fruit values (1,2,10);

1 row created.

SQL> insert into fruit values (3,1,20);

1 row created.

SQL> insert into fruit values (7,2,20);

1 row created.

SQL> insert into fruit values (4,1,30);

1 row created.

SQL> insert into fruit values (4,3,40);

1 row created.

SQL> commit;

Commit complete.

Open in new window


Checking the data is all there:

SQL> select id, store, xsize from fruit order by store, id, xsize;

        ID      STORE      XSIZE
---------- ---------- ----------
        10          1          1
        10          1          2
        20          1          3
        30          1          4
        10          2          1
        20          2          7
        40          3          4

7 rows selected.

Open in new window


Original Query:

SQL> SELECT
  2  ID, STORE,
  3  MAX(XSIZE) KEEP (DENSE_RANK FIRST ORDER BY STORE) AS FRUIT_SIZE
  4  FROM FRUIT
  5  GROUP BY ID, STORE
  6  ;

        ID      STORE FRUIT_SIZE
---------- ---------- ----------
        10          1          2
        10          2          1
        20          1          3
        20          2          7
        30          1          4
        40          3          4

6 rows selected.

Open in new window


This appears to be selecting the MAX FRUIT_SIZE for each combination of ID & Store

Ran your second query without the WHERE:

  1  SELECT
  2   ID, STORE, MAX(XSIZE) AS FRUIT_SIZE
  3  FROM FRUIT
  4  GROUP BY ID, STORE
  5* order by 1,2
SQL> /

        ID      STORE FRUIT_SIZE
---------- ---------- ----------
        10          1          2
        10          2          1
        20          1          3
        20          2          7
        30          1          4
        40          3          4

6 rows selected.

Open in new window


Same result set.

It would appear that for this particular sample of data, the second query does the same thing.

I still don't understand what this is attempting to give you other than the larges fruit size per ID / Store - and if that's really the case, the second query seems a whole lot easier to grasp to me.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
shacho

ASKER
Thanks for that!  OK - I'm still in the woods.  Gonna research a little more and pick this up tomorrow.

Mike
shacho

ASKER
Are there any good resources to look to to understand how Oracle interprets then executes SQL?  I just can't get my mind around the sequence of events that take place in the execution of this statement.
ASKER CERTIFIED SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
shacho

ASKER
Great!  That should do it!
This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23