<

Go Premium for a chance to win a PS4. Enter to Win

x

Analytical SQL : Where do you rank?

Published on
45,850 Points
15,750 Views
21 Endorsements
Last Modified:
Awarded
Kevin Cross
Father, husband and general problem solver who loves coding SQL, C#, Salesforce Apex or whatever.
"Top 10 customers by salesperson" sound familiar?  Yes!  I would expect so, and am sure there are a number of other great business analyses with a similar principle now flowing through your mind.

So, how do you query this data in your SQL system?  

No answer; no worries!  This article aims to answer that question for you.  As the title suggests, this comes down to knowing a customer's rank with respect to other customers for a given salesperson and then selecting just those customers with a rank no greater than 10.

Presuming you already have SQL knowledge, we will go through the different methods to getting this rank and subsequently selecting our data by it in various database systems.  To this end, you will find the article organized as follows to ease the navigation to the content most pertinent to your environment.

The Data.
Everyone Ranks The Same!
MS SQL and Oracle Move Up The Ranks!
MS Access Dense to Ranking.
MySQL Emulates The Best of Them.


1. The Test Data.


The following attachments are a set of SQL scripts to create the table structure and data we will be working with throughout this article.
MS SQL:
rankdata.sql.txt
MySQL:
rnkmysql.sql.txt
Oracle:
rankora.sql.txt
And to include our MS Access friends:
rankmdb.zip
For those comfortable in just running the script (appropriate to your environment) and understand the structure to be used in this article's queries, you can grab the file you need and move on to the next section, otherwise, I will go through briefly the table design and sample of data.

Important: If you already have a table in your system called "bookings", please change ALL references to the name "bookings" in the following SQL statements.

1. MySQL DDL to create table.
create table if not exists bookings (
   bk_id integer auto_increment primary key ,
   bk_date datetime ,
   bk_quantity integer ,
   bk_amount decimal(19,4) ,
   bk_salesperson varchar(50) ,
   bk_customer varchar(10)
)
;

Open in new window


2. Sampling of data.
insert into bookings( bk_date, bk_quantity, bk_amount, bk_salesperson, bk_customer )
select '20090928', 3, 56.0960, 'mark_wills', 'C543423'
union all select '20091003', 9, 220.8000, 'mwvisa1', 'C1234318'
union all select '20091008', 14, 364.4800, 'chapmandew', 'C89086'
union all select '20090928', 18, 627.6896, 'chapmandew', 'C101160'
union all select '20091002', 23, 790.2400, 'aneeshattingal', 'C123796'
union all select '20090930', 120, 4187.6480, 'angeliii', 'C123419'
union all select '20091010', 11, 276.0000, 'mwvisa1', 'C1234595'
union all select '20091007', 23, 784.6120, 'chapmandew', 'C1011923'
union all select '20091007', 8, 187.6480, 'angeliii', 'C123275'
union all select '20090910', 36, 1234.5000, 'aneeshattingal', 'C1234465'
union all select '20090912', 353, 12345.0000, 'angeliii', 'C1234300'
union all select '20090917', 163, 5690.5000, 'chapmandew', 'C543497'
union all select '20090909', 4, 87.6500, 'mark_wills', 'C543937'
union all select '20090914', 9, 234.7500, 'aneeshattingal', 'C123206'
union all select '20090908', 3, 69.5000, 'chapmandew', 'C890141'
union all select '20090913', 5, 123.4563, 'mark_wills', 'C101132'
union all select '20090917', 84, 2931.2500, 'angeliii', 'C1234754'
union all select '20090914', 5, 109.5625, 'angeliii', 'C543382'
union all select '20090915', 21, 711.8750, 'chapmandew', 'C890979'
union all select '20090923', 12, 300.1600, 'aneeshattingal', 'C12319'
union all select '20090923', 12, 300.1600, 'aneeshattingal', 'C123481'
union all select '20090924', 56, 1929.0000, 'angeliii', 'C104742'
union all select '20090916', 240, 8375.2960, 'angeliii', 'C123646'
;

Open in new window


We'll see more of the data in action soon enough, so let's move on shall we...

2. Everyone Ranks The Same!


We will be using a simple example of top 10 customers by salesperson.  This will be determined by highest sales volume ("bk_amount") first, highest quantity booked ("bk_quantity") second, and, lastly, the newest order date ("bk_date") if duplicate sales dollars and units are encountered.

MS Access, MS SQL, MySQL, and Oracle:
select bks.*
from (
   select bk_salesperson as salesperson, 
      bk_customer as customer, 
      bk_amount as sales, 
      bk_quantity as units,
      (select count(*) from bookings b2
       where b2.bk_salesperson = b1.bk_salesperson
         and (b2.bk_amount > b1.bk_amount
         or (b2.bk_amount = b1.bk_amount
            and (b2.bk_quantity > b1.bk_quantity
               or (b2.bk_quantity = b1.bk_quantity
			      and b2.bk_date > b1.bk_date)
            )
         ))
      ) + 1 as ranking
   from bookings b1
) bks
where ranking <= 10
order by salesperson, ranking
;

Open in new window

All execute the above SQL the same and have results that look similar to the below captured from MySQL Query Browser.
Rank by Derived Table ResultsResults are exactly what we needed in this case, but one thing to note is that this method is more of a dense ranking: if we had two customers with the same amount of dollars and units booked on the same exact date, then they would both get the same rank.

Furthermore, we have to use a derived table which then uses a sub query to get ranking per record.  I think you can see how the maintainability can get out of hand.

Let's keep moving...

3. MS SQL and Oracle Move Up The Ranks!


Being the above is very inefficient in complex cases, it is fortunate that MS SQL 2005+ and Oracle both offer great tools for handling ranking.

The following four functions all utilize the OVER() clause which takes the following form:
(MS SQL shown, please see Oracle link for differences)
OVER ( [ PARTITION BY value_expression , ... [ n ] ]
           <ORDER BY_Clause> )

Open in new window

Where PARTITION BY establishes the grouping over which the ranking window functions will count; the ORDER BY, the order of the count.

1. Row_Number() function.
Per my note above regarding our previous solution being a "dense rank", you will see that what works best for top n selections by some grouping is the row_number() function which will give a distinct 1-n ordering of records.

Our new code:
with bks_ranked
as
(
   select bk_salesperson as salesperson, 
      bk_customer as customer, 
      bk_amount as sales, 
      bk_quantity as units,
      row_number()
         over (partition by bk_salesperson
               order by bk_amount desc, 
                        bk_quantity desc, 
                        bk_date desc) ranking
   from bookings
)
select *
from bks_ranked
where ranking <= 10
order by salesperson, ranking
;

Open in new window

Runs the same in both MS SQL and Oracle, but look at the results from Oracle.
Rank with Analytical Functions ResultsView as spreadsheet :
rank-funcs-results.xls

So we replaced the derived table with a WITH clause here, so no real change there; however, notice we no longer have a sub query with a long where filter.  The over() analytical clause that accompanies row_number() function takes care of our logic very straight-forward: we partition (group by) a salesperson; get our highest amounts first; get our highest quantities first within matching amounts; get newest date within matching quantities.  Hopefully that sounds very familiar as that was our requirement.

For your reference, the other three functions, including ntile() not yet mentioned, also work in both MS SQL and Oracle and look something like this...

2. Rank() function.
rank()
         over (partition by bk_salesperson
               order by bk_amount desc, 
                        bk_quantity desc, 
                        bk_customer) ranking

Open in new window


3. Dense_Rank() function.
dense_rank()
         over (partition by bk_salesperson
               order by bk_amount desc, 
                        bk_quantity desc, 
                        bk_customer) ranking

Open in new window


Both of the above will give you duplicate ranks for duplicate values.  The key difference between the two is that rank() keeps track of the number of rows that are duplicated while dense_rank() does not.  In other words, rank() will return five rows with rank of 1 and then a row or rows with rank of 6; however, dense_rank() will return the 6th and subsequent matching rows with a rank of 2, meaning second highest ranking.  

4. NTile(x) function.
(where x is numerical value representing the number of groups the data is split into)
ntile(10)
         over (partition by bk_salesperson
               order by bk_amount desc, 
                        bk_quantity desc, 
                        bk_customer) ranking

Open in new window

This would be nice for top 10% of customers by then selecting only customers with ranking = 1.  Just for a comparison / visual, take a look at the results.
NTile Results (Top 10 %)
As discussed in the Flexibility at a Price article, we shouldn't try to force the use of row_number(), rank(), or dense_rank() for all cases, but we truly have a good fit scenario here and I would highly recommend this approach when applicable.

For more details on these analytical or ranking functions, please see the references below specific to your SQL platform.

Now, what about MS Access and MySQL?

4. MS Access Dense to Ranking.


MS Access readers, sorry I don't have a lot more for you.  The "universal" approach shown in the "Everyone Ranks The Same!" section is pretty much the pure SQL approach available in Access to my knowledge.  

Using DCount, however, we can eliminate at least the need for the sub query, but introduces a little VB syntax to our SQL.
SELECT bks.*
FROM (
   SELECT bk_salesperson AS salesperson, 
                bk_customer AS customer, 
                bk_amount AS sales, 
                bk_quantity AS units, 
                dcount("*","bookings",
                                   "bk_salesperson = """ &
                                   b1.bk_salesperson & """ and bk_amount > " & 
                                   b1.bk_amount)+1 AS ranking 
   FROM bookings AS b1
)  AS bks
WHERE ranking<=10
ORDER BY salesperson, ranking
;

Open in new window

As you can see how this could get cumbersome with just the one criteria of our row numbering included and a quick look through the results will show what "dense ranking" means to you.
Rank with Dcount Results
For MS Access I would recommend you stay with the sub query approach or go to full VBA for better control (maintainability) of complex ranking filters similar to what we started with.  However, I would consider the DCount() methodology very useful in simple cases such as shown above with one partitioning filter (by salesperson) and one ranking one (by amount).

And last but not least...

5. MySQL Emulates The Best of Them.


For MySQL, the following options provide close likenesses to the row_number() function.

1. Group_Concat() and Find_In_Set(): a great combination.
Group_Concat(), roughly, handles the same duties as the over() clause.  To best illustrate this point, take a look at the syntax for Group_Concat() to partition by a salesperson and order the associated customers by amount, quantity, and date as we did above.
select bk_salesperson as salesperson,
       group_concat(bk_customer
          order by bk_amount desc,
                   bk_quantity desc,
                   bk_date desc) as customers
from bookings
group by bk_salesperson
;

Open in new window

And visually resulting in:
Group_Concat() ResultsTechnically, what you are seeing is a BLOB (en.wikipedia.org/wiki/Binary_large_object) containing all the customers grouped under a given salesperson.  Therefore, this is very powerful, but, potentially, inefficient if improperly used.

For our ranking needs, let's move forward with the understanding the benefit outweighs the consequences here...

Find_In_Set(), as Angel Eyes discusses in "delimited list as parameter, what are the options?", works well for parsing a comma delimited list and so when coupled with our Group_Concat() results acts as the row_number() function.

So putting the code together:
select bk_salesperson as salesperson,
       bk_customer as customer,
       bk_amount as sales,
       bk_quantity as units,
       find_in_set(b.bk_customer, customers) as ranking
from bookings b
inner join (
   select bk_salesperson as salesperson,
          group_concat(bk_customer
             order by bk_amount desc,
                      bk_quantity desc,
                      bk_date desc) as customers
   from bookings
   group by bk_salesperson
) p on p.salesperson = b.bk_salesperson
where find_in_set(b.bk_customer, customers) <= 10
order by salesperson, ranking
;

Open in new window

Voilà !

Any questions, please refer to the reference "Emulating Analytic (AKA Ranking) Functions with MySQL".  If you still would like to learn more, please leave me a comment below, but onward we shall march...

2. RowNum Adaptation.
Another interesting method I will demonstrate below was adapted from my reading of "Displaying Row Number (rownum) in MySQL" which was just meant to be a brief blog on displaying row number on a result set.  

Since the row_number() portion is covered for us already, the trick is simply to compose a good over() equivalent.  My approach, is to add another variable for the "partition by" portion of over() which I called "lastsalesperson".  
(select @rownum:=0, @lastsalesperson:=null) r

Open in new window

The idea is exactly as discussed in the link, we have to order our records exactly as needed and then number each sequentially, resetting the count when the new salesperson value is not equal to the last.

Clear ?  

Let's look at some code that should help:
select salesperson, customer, sales, units, ranking
from (

select bk_customer as customer,
       bk_amount as sales,
       bk_quantity as units,

       /* if salesperson is same as row above,
          increment row number; else, start at 1. */
       if(bk_salesperson = @lastsalesperson,
          @rownum:=@rownum+1,
          @rownum:=1) as ranking,

       /* since using @lastsalesperson in @rownum,
          have to set @lastsalesperson 2nd;
          otherwise, current and last would always equal.
        */
       @lastsalesperson:=bk_salesperson as salesperson
from bookings b,
(select @rownum:=0, @lastsalesperson:=null) r
order by bk_salesperson,
         bk_amount desc,
         bk_quantity desc,
         bk_date desc

) bks
where ranking <= 10
order by salesperson, ranking
;

Open in new window

Not too difficult and code should speak for itself, but please leave me comments below for more questions.

Personally I have found this more efficient (or at least faster) in a number of cases, but each scenario as we have been saying throughout should be considered careful before just implementing one blanket approach across the board.

That as they say is all folks!


Hopefully you now know where your SQL ranks in the world of analytical functions and know when to pull these handy statements out of the toolbox.  Additionally, if nothing else, please take away that for the best blend of power and performance, make sure to utilize the tools available to you such as EXPLAIN to determine the query and function set with which you will get the best execution plan and/or where indexing may need to be tweaked to aid you in your task(s) of keeping your data in rank.

Lastly, with MS Access 2007, MS SQL 2005, MySQL 5.0.x, and Oracle 10g installed and used in testing, I would love to hear your feedback if I missed something from your environment that I will be sure to update accordingly.

Thanks for reading!

Happy coding!

Best regards,

Kevin (aka MWVisa1)


Related Resources / References:

(Referred to for FIND_IN_SET(); however, this is a general database resource)
delimited list as parameter, what are the options?
http://www.experts-exchange.com/A_1536.html

MS Access
Rank (Sub Query Methodology)
ACC2.0-97 > http://support.microsoft.com/kb/120608/en-us
ACC2000+ > http://support.microsoft.com/kb/208946

MS SQL
Ranking Functions in MS SQL
http://msdn.microsoft.com/en-us/library/ms189798(SQL.90).aspx

ROW_NUMBER(), RANK(), and DENSE_RANK()   Flexibility at a Price
http://thehobt.blogspot.com/2009/02/rownumber-rank-and-denserank.html

MySQL
Emulating Analytic (AKA Ranking) Functions with MySQL
http://onlamp.com/pub/a/mysql/2007/03/29/emulating-analytic-aka-ranking-functions-with-mysql.html

Displaying Row Number (rownum) in MySQL
http://jimlife.wordpress.com/2008/09/09/displaying-row-number-rownum-in-mysql/

Database Developers' Quick-Reference to MySQL
http://www.experts-exchange.com/A_659.html

Database Developers' Quick-Reference to MySQL DDL
http://www.experts-exchange.com/A_1522.html

Oracle
Analytical (Ranking) Functions in Oracle
http://download.oracle.com/docs/cd/A87860_01/doc/server.817/a76994/analysis.htm#13275

 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you found this article helpful, please click the Yes button after the question just below. This will give me (the author) a few points and might encourage me to write more articles.

If you didn't or otherwise feel the need to vote No, please first leave a comment to give me a chance to answer and perhaps to improve this article.

Thank you!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
21
Comment
Author:Kevin Cross
14 Comments
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
This articles goes straight into the "series" I am about to do, comparing "everydays" sql problems across databases... GOOD stuff
0
 
LVL 32

Expert Comment

by:Daniel Wilson
Nice work, Kevin!  And thanks for the links.
0
 
LVL 60

Author Comment

by:Kevin Cross
Thanks, Angel Eyes and Daniel.  It was my pleasure to link to articles from you two.

And Angel Eyes, I am glad I did this one justice, but eagerly await your other articles in the series.  Glad I spread my wings beyond MS SQL and MySQL on this one.

Thanks again!
0
Get free NFR key for Veeam Availability Suite 9.5

Veeam is happy to provide a free NFR license (1 year, 2 sockets) to all certified IT Pros. The license allows for the non-production use of Veeam Availability Suite v9.5 in your home lab, without any feature limitations. It works for both VMware and Hyper-V environments

 
LVL 60

Author Comment

by:Kevin Cross
MySQL Readers :

The MySQL Group_Concat() approach as written above will emulate the row_number() or rank() functions in MS SQL as discussed, depending on how your data is.  If you want to emulate dense_rank(), which we learned is different from rank() in that it does not care how many rows are ranked 1 -- next row will be 2 and so on, you can add DISTINCT keyword to the group_concat().

GROUP_CONCAT([DISTINCT] expr [,expr ...]
             [ORDER BY {unsigned_integer | col_name | expr}
                 [ASC | DESC] [,col_name ...]]
             [SEPARATOR str_val])

Therefore, if you have data that has duplicate values, using this approach since you are making the values DISTINCT then each request for a specific value will result in the same index / place within FIND_IN_SET() say 1st element and then the next element is 2nd and so on just as is the case with dense_rank().

Thanks again for reading!

Best regards and happy coding,

Kevin
0
 
LVL 58

Expert Comment

by:tigermatt
Thanks Kevin - great article. Voted Yes above.
0
 
LVL 60

Author Comment

by:Kevin Cross
You are welcome.  Thank you for the vote!


As a reminder for Oracle readers, you have the magical pseudo column RowNum that can come in handy in certain ranking situations.

http://www.oracle.com/technology/oramag/oracle/06-sep/o56asktom.html 

This was the inspiration behind the MySQL RowNum Adaptation, so figured it would be good to include reference above for those curious.  Below is an example to get the top 10 booking records.


Thanks again for reading!


select b.*, rownum as ranking
from (
   select bk_salesperson as salesperson, 
      bk_customer as customer, 
      bk_amount as sales, 
      bk_quantity as units
   from bookings
   order by bk_amount desc, 
            bk_quantity desc, 
            bk_date desc
) b
where rownum <= 10
order by ranking
;

Open in new window

0
 
LVL 75

Expert Comment

by:Aneesh Retnakaran
I learned something new today, thanks Kevin,i think i can answer few more queries from MySQL and MSAccess, Great article Kevin..
0
 
LVL 60

Author Comment

by:Kevin Cross
Thanks for the kind words and vote!
Glad you found this helpful, Aneesh.

Regards,
Kevin
0
 
LVL 10

Expert Comment

by:lof
Hi

I've used the article few times already. Thanks mwvisa1!

Regards,
lof
0
 
LVL 60

Author Comment

by:Kevin Cross
You are most welcome, lof!
Thank you for the kind words and the vote.

Respectfully yours,

Kevin
0
 
LVL 60

Author Comment

by:Kevin Cross
For the Microsoft Access readers, if you wanted to delve more into Dlookup(), these article are a great read:

Dlookup and the Domain Functions
http://www.experts-exchange.com/A_12.html

Access Techniques: Fast Table Lookup Functions
http://www.experts-exchange.com/A_1921.html

Access & VB's Missing Domain Lookup Functions
http://www.experts-exchange.com/A_2011.html

Additionally, to further stimulate your brain cells and see Dlookup() in action, see:

Computing row-wise aggregations in Access
http://www.experts-exchange.com/A_1775.html

Probably some other very good ones about you can find, just wanted to add these for your convenience.

Thanks again for reading and for the support from those who have/will vote this helpful.

Respectfully yours,

Kevin
0
 
LVL 111

Expert Comment

by:Ray Paseur
Very nicely done!  Glad you posted a link to this where I could find it! ~Ray
0
 
LVL 60

Author Comment

by:Kevin Cross
Thanks, Ray! Very much appreciated. --Kevin
0
 

Expert Comment

by:jwfollette
Kevin, great article and a very valuable approach (as you have shown me).  Thanks again for the help!
0

Featured Post

Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

Join & Write a Comment

With Microsoft Access, learn how to specify relationships between tables and set various options on the relationship. Add the tables: Create the relationship: Decide if you’re going to set referential integrity: Decide if you want cascade upda…
Have you created a query with information for a calendar? ... and then, abra-cadabra, the calendar is done?! I am going to show you how to make that happen. Visualize your data!  ... really see it To use the code to create a calendar from a q…

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month