Grouping information based on highest repeats

Posted on 2007-07-30
Medium Priority
Last Modified: 2010-03-19
table1 contains a list of all the skus that a customer has purchased from us throughout the lifetime of them ordering. Each sku is associated to 1 of 5 different categories.
I need a query that will select the 1 category that appears the most per customer ID. Please help.

Note: total # of unique skus is irrelevant.
Also, the category must be chosen based on the categories located in all orders placed by customer.
See sample of data below including intended result

intended result:
custID   Category
10          Kitchen (was in 4 different rows)
11          Kitchen (was in 3 different rows)


pk      custID      OrderNum      SKU      Category
1      10      1      JET123      Kitchen
2      10      1      CD3      Office
3      10      1      ABU7834      Kitchen
4      10      2      JET123      Kitchen
5      10      2      RBT134      Janitorial
6      10      2      JET123      Kitchen
7      10      3      CD3      Office
8      11      4      ET12      Kitchen
9      11      4      D3      Office
10      11      4      AU834      Kitchen
11      11      5      JE13      Kitchen
12      11      5      RT34      Janitorial
13      11      5      eT12      Disposables
14      11      6      CD      Office
Question by:restockit
  • 3
  • 3

Assisted Solution

k_rasuri earned 400 total points
ID: 19596778
select  max(count), custid, sku from
(select count(*) as count, custid, sku from yourtable
group by custid, sku) s1
group by custid, sku

Accepted Solution

kenhaley earned 1600 total points
ID: 19597649
This solution uses a temp table to keep things simple:

select custID, Category, ct = count(*)
    into #temp from table1
    group by custID, Category
select custID, Category from #temp t1
    where ct = (select max(ct) from #temp where custID=t1.custID)
drop table #temp

Expert Comment

ID: 19597668
Additional comments:
Note that this solution may give more than one answer per customer if there's a tie for the maximum number of rows that 2 or more different categories are found for that customer.
E.g, if cust 11 ordered another office item instead of Janitorial, the results would be
11 Kitchen
11 Office
10 Kitchen
Please say if the query should just select one of these.

By the way, restockit, I meant to say "thank you" for a very clearly and simply posed question, complete with sample data and expected results.  Makes it easy to create a test case for developing a solution.
Granular recovery for Microsoft Exchange

With Veeam Explorer for Microsoft Exchange you can choose the Exchange Servers and restore points you’re interested in, and Veeam Explorer will present the contents of those mailbox stores for browsing, searching and exporting.


Author Comment

ID: 19600592
Hey kenhaly,

thank you kindly for the compliment.


With respect to your comment, that was exactly what I was going to ask as a follow up to my initial question. The rule should be as follows:

If there is a tie, then select the category that has the highest cumulative revenue.
OrderedProductPrice is a field in the db that takes (quantity purchased X product price) for each sku. I would need to select a category where the cumulative orderproductprice revenue of all skus in the category was the greatest.


In regards to your great solution using the temp table, there are only 5 different categories  that the result set can display (Office, Janitorial, Kitchen, Tabletop, or Disposables). Table1, however, contains other categories (Admiral Craft, Libbey, and Rubbermaid) as well. For every sku that is associated with the following category, I need it to be changed (before any other calculations begin):

Admiral Kraft --> Kitchen
Libbey --> Kitchen
Rubbermaid --> Janitorial


Author Comment

ID: 19600762
Disregard part 2. thanks

Expert Comment

ID: 19601595
Part 1 answer:
     That makes it interesting. (actually, it was already interesting!)  I would create an artificial field called, say "score" which is computed to be (ct * 1,000,000) + revenue.  So if someone has 3 Kitchen items with a total revenue of $20,000, that would create oa score of 3,020,000, and 3 Disposable items with a total revenue of $25,000 that would create a score of 3,025,000.  Then, record score in the temp table, instead of ct, and do the same thing that we did with ct before.  (The number you mulitiply by in the formula -- 1,000,000 in this case -- has to be larger than the highest possible revenue.)

Good luck with it.

Author Comment

ID: 19631371
thanks guys!

Featured Post

Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Ready to get certified? Check out some courses that help you prepare for third-party exams.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.
Suggested Courses

850 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question