Link to home
Start Free TrialLog in
Avatar of restockit
restockit

asked on

Grouping information based on highest repeats

table1 contains a list of all the skus that a customer has purchased from us throughout the lifetime of them ordering. Each sku is associated to 1 of 5 different categories.
I need a query that will select the 1 category that appears the most per customer ID. Please help.

Note: total # of unique skus is irrelevant.
Also, the category must be chosen based on the categories located in all orders placed by customer.
See sample of data below including intended result

intended result:
custID   Category
10          Kitchen (was in 4 different rows)
11          Kitchen (was in 3 different rows)

DATA:

pk      custID      OrderNum      SKU      Category
1      10      1      JET123      Kitchen
2      10      1      CD3      Office
3      10      1      ABU7834      Kitchen
4      10      2      JET123      Kitchen
5      10      2      RBT134      Janitorial
6      10      2      JET123      Kitchen
7      10      3      CD3      Office
8      11      4      ET12      Kitchen
9      11      4      D3      Office
10      11      4      AU834      Kitchen
11      11      5      JE13      Kitchen
12      11      5      RT34      Janitorial
13      11      5      eT12      Disposables
14      11      6      CD      Office
SOLUTION
Avatar of k_rasuri
k_rasuri

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of kenhaley
kenhaley

Additional comments:
Note that this solution may give more than one answer per customer if there's a tie for the maximum number of rows that 2 or more different categories are found for that customer.
E.g, if cust 11 ordered another office item instead of Janitorial, the results would be
11 Kitchen
11 Office
10 Kitchen
Please say if the query should just select one of these.

By the way, restockit, I meant to say "thank you" for a very clearly and simply posed question, complete with sample data and expected results.  Makes it easy to create a test case for developing a solution.
Avatar of restockit

ASKER

Hey kenhaly,

thank you kindly for the compliment.

PART 1:

With respect to your comment, that was exactly what I was going to ask as a follow up to my initial question. The rule should be as follows:

If there is a tie, then select the category that has the highest cumulative revenue.
OrderedProductPrice is a field in the db that takes (quantity purchased X product price) for each sku. I would need to select a category where the cumulative orderproductprice revenue of all skus in the category was the greatest.

PART 2:

In regards to your great solution using the temp table, there are only 5 different categories  that the result set can display (Office, Janitorial, Kitchen, Tabletop, or Disposables). Table1, however, contains other categories (Admiral Craft, Libbey, and Rubbermaid) as well. For every sku that is associated with the following category, I need it to be changed (before any other calculations begin):

Admiral Kraft --> Kitchen
Libbey --> Kitchen
Rubbermaid --> Janitorial

Thanks
Disregard part 2. thanks
Part 1 answer:
     That makes it interesting. (actually, it was already interesting!)  I would create an artificial field called, say "score" which is computed to be (ct * 1,000,000) + revenue.  So if someone has 3 Kitchen items with a total revenue of $20,000, that would create oa score of 3,020,000, and 3 Disposable items with a total revenue of $25,000 that would create a score of 3,025,000.  Then, record score in the temp table, instead of ct, and do the same thing that we did with ct before.  (The number you mulitiply by in the formula -- 1,000,000 in this case -- has to be larger than the highest possible revenue.)

Good luck with it.
thanks guys!