Grouping information based on highest repeats

Posted on 2007-07-30
Last Modified: 2010-03-19
table1 contains a list of all the skus that a customer has purchased from us throughout the lifetime of them ordering. Each sku is associated to 1 of 5 different categories.
I need a query that will select the 1 category that appears the most per customer ID. Please help.

Note: total # of unique skus is irrelevant.
Also, the category must be chosen based on the categories located in all orders placed by customer.
See sample of data below including intended result

intended result:
custID   Category
10          Kitchen (was in 4 different rows)
11          Kitchen (was in 3 different rows)


pk      custID      OrderNum      SKU      Category
1      10      1      JET123      Kitchen
2      10      1      CD3      Office
3      10      1      ABU7834      Kitchen
4      10      2      JET123      Kitchen
5      10      2      RBT134      Janitorial
6      10      2      JET123      Kitchen
7      10      3      CD3      Office
8      11      4      ET12      Kitchen
9      11      4      D3      Office
10      11      4      AU834      Kitchen
11      11      5      JE13      Kitchen
12      11      5      RT34      Janitorial
13      11      5      eT12      Disposables
14      11      6      CD      Office
Question by:restockit
    LVL 8

    Assisted Solution

    select  max(count), custid, sku from
    (select count(*) as count, custid, sku from yourtable
    group by custid, sku) s1
    group by custid, sku
    LVL 6

    Accepted Solution

    This solution uses a temp table to keep things simple:

    select custID, Category, ct = count(*)
        into #temp from table1
        group by custID, Category
    select custID, Category from #temp t1
        where ct = (select max(ct) from #temp where custID=t1.custID)
    drop table #temp
    LVL 6

    Expert Comment

    Additional comments:
    Note that this solution may give more than one answer per customer if there's a tie for the maximum number of rows that 2 or more different categories are found for that customer.
    E.g, if cust 11 ordered another office item instead of Janitorial, the results would be
    11 Kitchen
    11 Office
    10 Kitchen
    Please say if the query should just select one of these.

    By the way, restockit, I meant to say "thank you" for a very clearly and simply posed question, complete with sample data and expected results.  Makes it easy to create a test case for developing a solution.

    Author Comment

    Hey kenhaly,

    thank you kindly for the compliment.

    PART 1:

    With respect to your comment, that was exactly what I was going to ask as a follow up to my initial question. The rule should be as follows:

    If there is a tie, then select the category that has the highest cumulative revenue.
    OrderedProductPrice is a field in the db that takes (quantity purchased X product price) for each sku. I would need to select a category where the cumulative orderproductprice revenue of all skus in the category was the greatest.

    PART 2:

    In regards to your great solution using the temp table, there are only 5 different categories  that the result set can display (Office, Janitorial, Kitchen, Tabletop, or Disposables). Table1, however, contains other categories (Admiral Craft, Libbey, and Rubbermaid) as well. For every sku that is associated with the following category, I need it to be changed (before any other calculations begin):

    Admiral Kraft --> Kitchen
    Libbey --> Kitchen
    Rubbermaid --> Janitorial


    Author Comment

    Disregard part 2. thanks
    LVL 6

    Expert Comment

    Part 1 answer:
         That makes it interesting. (actually, it was already interesting!)  I would create an artificial field called, say "score" which is computed to be (ct * 1,000,000) + revenue.  So if someone has 3 Kitchen items with a total revenue of $20,000, that would create oa score of 3,020,000, and 3 Disposable items with a total revenue of $25,000 that would create a score of 3,025,000.  Then, record score in the temp table, instead of ct, and do the same thing that we did with ct before.  (The number you mulitiply by in the formula -- 1,000,000 in this case -- has to be larger than the highest possible revenue.)

    Good luck with it.

    Author Comment

    thanks guys!

    Featured Post

    Better Security Awareness With Threat Intelligence

    See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

    Join & Write a Comment

    Introduction SQL Server Integration Services can read XML files, that’s known by every BI developer.  (If you didn’t, don’t worry, I’m aiming this article at newcomers as well.) But how far can you go?  When does the XML Source component become …
    In this article I will describe the Backup & Restore method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
    Via a live example combined with referencing Books Online, show some of the information that can be extracted from the Catalog Views in SQL Server.
    Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

    734 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    23 Experts available now in Live!

    Get 1:1 Help Now