"Top 10 customers by salesperson" sound familiar? Yes! I would expect so, and am sure there are a number of other great business analyses with a similar principle now flowing through your mind.
So, how do you query this data in your SQL system?
No answer; no worries! This article aims to answer that question for you. As the title suggests, this comes down to knowing a customer's rank with respect to other customers for a given salesperson and then selecting just those customers with a rank no greater than 10.
Presuming you already have SQL knowledge, we will go through the different methods to getting this rank and subsequently selecting our data by it in various database systems. To this end, you will find the article organized as follows to ease the navigation to the content most pertinent to your environment.
The Data.
Everyone Ranks The Same!
MS SQL and Oracle Move Up The Ranks!
MS Access Dense to Ranking.
MySQL Emulates The Best of Them.
1. The Test Data.
The following attachments are a set of SQL scripts to create the table structure and data we will be working with throughout this article.
MS SQL: rankdata.sql.txt
MySQL: rnkmysql.sql.txt
Oracle: rankora.sql.txt
And to include our MS Access friends: rankmdb.zip
For those comfortable in just running the script (appropriate to your environment) and understand the structure to be used in this article's queries, you can grab the file you need and move on to the next section, otherwise, I will go through briefly the table design and sample of data.
Important: If you already have a table in your system called "bookings", please change ALL references to the name "bookings" in the following SQL statements.
We'll see more of the data in action soon enough, so let's move on shall we...
2. Everyone Ranks The Same!
We will be using a simple example of top 10 customers by salesperson. This will be determined by highest sales volume ("bk_amount") first, highest quantity booked ("bk_quantity") second, and, lastly, the newest order date ("bk_date") if duplicate sales dollars and units are encountered.
MS Access, MS SQL, MySQL, and Oracle:
select bks.*from ( select bk_salesperson as salesperson, bk_customer as customer, bk_amount as sales, bk_quantity as units, (select count(*) from bookings b2 where b2.bk_salesperson = b1.bk_salesperson and (b2.bk_amount > b1.bk_amount or (b2.bk_amount = b1.bk_amount and (b2.bk_quantity > b1.bk_quantity or (b2.bk_quantity = b1.bk_quantity and b2.bk_date > b1.bk_date) ) )) ) + 1 as ranking from bookings b1) bkswhere ranking <= 10order by salesperson, ranking;
All execute the above SQL the same and have results that look similar to the below captured from MySQL Query Browser. Results are exactly what we needed in this case, but one thing to note is that this method is more of a dense ranking: if we had two customers with the same amount of dollars and units booked on the same exact date, then they would both get the same rank.
Furthermore, we have to use a derived table which then uses a sub query to get ranking per record. I think you can see how the maintainability can get out of hand.
Let's keep moving...
3. MS SQL and Oracle Move Up The Ranks!
Being the above is very inefficient in complex cases, it is fortunate that MS SQL 2005+ and Oracle both offer great tools for handling ranking.
The following four functions all utilize the OVER() clause which takes the following form:
(MS SQL shown, please see Oracle link for differences)
OVER ( [ PARTITION BY value_expression , ... [ n ] ] <ORDER BY_Clause> )
Where PARTITION BY establishes the grouping over which the ranking window functions will count; the ORDER BY, the order of the count.
1. Row_Number() function.
Per my note above regarding our previous solution being a "dense rank", you will see that what works best for top n selections by some grouping is the row_number() function which will give a distinct 1-n ordering of records.
Our new code:
with bks_rankedas( select bk_salesperson as salesperson, bk_customer as customer, bk_amount as sales, bk_quantity as units, row_number() over (partition by bk_salesperson order by bk_amount desc, bk_quantity desc, bk_date desc) ranking from bookings)select *from bks_rankedwhere ranking <= 10order by salesperson, ranking;
Runs the same in both MS SQL and Oracle, but look at the results from Oracle. View as spreadsheet : rank-funcs-results.xls
So we replaced the derived table with a WITH clause here, so no real change there; however, notice we no longer have a sub query with a long where filter. The over() analytical clause that accompanies row_number() function takes care of our logic very straight-forward: we partition (group by) a salesperson; get our highest amounts first; get our highest quantities first within matching amounts; get newest date within matching quantities. Hopefully that sounds very familiar as that was our requirement.
For your reference, the other three functions, including ntile() not yet mentioned, also work in both MS SQL and Oracle and look something like this...
2. Rank() function.
rank() over (partition by bk_salesperson order by bk_amount desc, bk_quantity desc, bk_customer) ranking
Both of the above will give you duplicate ranks for duplicate values. The key difference between the two is that rank() keeps track of the number of rows that are duplicated while dense_rank() does not. In other words, rank() will return five rows with rank of 1 and then a row or rows with rank of 6; however, dense_rank() will return the 6th and subsequent matching rows with a rank of 2, meaning second highest ranking.
4. NTile(x) function.
(where x is numerical value representing the number of groups the data is split into)
ntile(10) over (partition by bk_salesperson order by bk_amount desc, bk_quantity desc, bk_customer) ranking
This would be nice for top 10% of customers by then selecting only customers with ranking = 1. Just for a comparison / visual, take a look at the results.
As discussed in the Flexibility at a Price article, we shouldn't try to force the use of row_number(), rank(), or dense_rank() for all cases, but we truly have a good fit scenario here and I would highly recommend this approach when applicable.
For more details on these analytical or ranking functions, please see the references below specific to your SQL platform.
Now, what about MS Access and MySQL?
4. MS Access Dense to Ranking.
MS Access readers, sorry I don't have a lot more for you. The "universal" approach shown in the "Everyone Ranks The Same!" section is pretty much the pure SQL approach available in Access to my knowledge.
Using DCount, however, we can eliminate at least the need for the sub query, but introduces a little VB syntax to our SQL.
SELECT bks.*FROM ( SELECT bk_salesperson AS salesperson, bk_customer AS customer, bk_amount AS sales, bk_quantity AS units, dcount("*","bookings", "bk_salesperson = """ & b1.bk_salesperson & """ and bk_amount > " & b1.bk_amount)+1 AS ranking FROM bookings AS b1) AS bksWHERE ranking<=10ORDER BY salesperson, ranking;
As you can see how this could get cumbersome with just the one criteria of our row numbering included and a quick look through the results will show what "dense ranking" means to you.
For MS Access I would recommend you stay with the sub query approach or go to full VBA for better control (maintainability) of complex ranking filters similar to what we started with. However, I would consider the DCount() methodology very useful in simple cases such as shown above with one partitioning filter (by salesperson) and one ranking one (by amount).
And last but not least...
5. MySQL Emulates The Best of Them.
For MySQL, the following options provide close likenesses to the row_number() function.
1. Group_Concat() and Find_In_Set(): a great combination.
Group_Concat(), roughly, handles the same duties as the over() clause. To best illustrate this point, take a look at the syntax for Group_Concat() to partition by a salesperson and order the associated customers by amount, quantity, and date as we did above.
select bk_salesperson as salesperson, group_concat(bk_customer order by bk_amount desc, bk_quantity desc, bk_date desc) as customersfrom bookingsgroup by bk_salesperson;
And visually resulting in: Technically, what you are seeing is a BLOB (en.wikipedia.org/wiki/Binary_large_object) containing all the customers grouped under a given salesperson. Therefore, this is very powerful, but, potentially, inefficient if improperly used.
For our ranking needs, let's move forward with the understanding the benefit outweighs the consequences here...
Find_In_Set(), as Angel Eyes discusses in "delimited list as parameter, what are the options?", works well for parsing a comma delimited list and so when coupled with our Group_Concat() results acts as the row_number() function.
So putting the code together:
select bk_salesperson as salesperson, bk_customer as customer, bk_amount as sales, bk_quantity as units, find_in_set(b.bk_customer, customers) as rankingfrom bookings binner join ( select bk_salesperson as salesperson, group_concat(bk_customer order by bk_amount desc, bk_quantity desc, bk_date desc) as customers from bookings group by bk_salesperson) p on p.salesperson = b.bk_salespersonwhere find_in_set(b.bk_customer, customers) <= 10order by salesperson, ranking;
2. RowNum Adaptation.
Another interesting method I will demonstrate below was adapted from my reading of "Displaying Row Number (rownum) in MySQL" which was just meant to be a brief blog on displaying row number on a result set.
Since the row_number() portion is covered for us already, the trick is simply to compose a good over() equivalent. My approach, is to add another variable for the "partition by" portion of over() which I called "lastsalesperson".
The idea is exactly as discussed in the link, we have to order our records exactly as needed and then number each sequentially, resetting the count when the new salesperson value is not equal to the last.
Clear ?
Let's look at some code that should help:
select salesperson, customer, sales, units, rankingfrom (select bk_customer as customer, bk_amount as sales, bk_quantity as units, /* if salesperson is same as row above, increment row number; else, start at 1. */ if(bk_salesperson = @lastsalesperson, @rownum:=@rownum+1, @rownum:=1) as ranking, /* since using @lastsalesperson in @rownum, have to set @lastsalesperson 2nd; otherwise, current and last would always equal. */ @lastsalesperson:=bk_salesperson as salespersonfrom bookings b,(select @rownum:=0, @lastsalesperson:=null) rorder by bk_salesperson, bk_amount desc, bk_quantity desc, bk_date desc) bkswhere ranking <= 10order by salesperson, ranking;
Not too difficult and code should speak for itself, but please leave me comments below for more questions.
Personally I have found this more efficient (or at least faster) in a number of cases, but each scenario as we have been saying throughout should be considered careful before just implementing one blanket approach across the board.
That as they say is all folks!
Hopefully you now know where your SQL ranks in the world of analytical functions and know when to pull these handy statements out of the toolbox. Additionally, if nothing else, please take away that for the best blend of power and performance, make sure to utilize the tools available to you such as EXPLAIN to determine the query and function set with which you will get the best execution plan and/or where indexing may need to be tweaked to aid you in your task(s) of keeping your data in rank.
Lastly, with MS Access 2007, MS SQL 2005, MySQL 5.0.x, and Oracle 10g installed and used in testing, I would love to hear your feedback if I missed something from your environment that I will be sure to update accordingly.
Thanks for reading!
Happy coding!
Best regards,
Kevin (aka MWVisa1)
Related Resources / References:
(Referred to for FIND_IN_SET(); however, this is a general database resource)
delimited list as parameter, what are the options? http://www.experts-exchange.com/A_1536.html
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= If you found this article helpful, please click the Yes button after the question just below. This will give me (the author) a few points and might encourage me to write more articles.
If you didn't or otherwise feel the need to vote No, please first leave a comment to give me a chance to answer and perhaps to improve this article.
Thanks, Angel Eyes and Daniel. It was my pleasure to link to articles from you two.
And Angel Eyes, I am glad I did this one justice, but eagerly await your other articles in the series. Glad I spread my wings beyond MS SQL and MySQL on this one.
Thanks again!
0
While there were several headline-grabbing ransomware attacks during in 2017, another big threat started appearing at the same time that didn’t get the same coverage – illicit cryptomining.
The MySQL Group_Concat() approach as written above will emulate the row_number() or rank() functions in MS SQL as discussed, depending on how your data is. If you want to emulate dense_rank(), which we learned is different from rank() in that it does not care how many rows are ranked 1 -- next row will be 2 and so on, you can add DISTINCT keyword to the group_concat().
Therefore, if you have data that has duplicate values, using this approach since you are making the values DISTINCT then each request for a specific value will result in the same index / place within FIND_IN_SET() say 1st element and then the next element is 2nd and so on just as is the case with dense_rank().
This was the inspiration behind the MySQL RowNum Adaptation, so figured it would be good to include reference above for those curious. Below is an example to get the top 10 booking records.
Thanks again for reading!
select b.*, rownum as rankingfrom ( select bk_salesperson as salesperson, bk_customer as customer, bk_amount as sales, bk_quantity as units from bookings order by bk_amount desc, bk_quantity desc, bk_date desc) bwhere rownum <= 10order by ranking;
Kevin, great article and a very valuable approach (as you have shown me). Thanks again for the help!
0
Featured Post
Learn about longtime user Rowby Goren and his great contributions to the site. We explore his method for posing questions that are likely to yield a solution, and take a look at how his career transformed from a Hollywood writer to a website entrepreneur.
Visualize your data even better in Access queries. Given a date and a value, this lesson shows how to compare that value with the previous value, calculate the difference, and display a circle if the value is the same, an up triangle if it increased…
Make it easier to see the current record on your Microsoft Access forms! To highlight the current record with a yellow background color, use Conditional Formatting, a control to keep track of the primary key value, a control to change color, and a l…