restockit
asked on
Grouping information based on highest repeats
table1 contains a list of all the skus that a customer has purchased from us throughout the lifetime of them ordering. Each sku is associated to 1 of 5 different categories.
I need a query that will select the 1 category that appears the most per customer ID. Please help.
Note: total # of unique skus is irrelevant.
Also, the category must be chosen based on the categories located in all orders placed by customer.
See sample of data below including intended result
intended result:
custID Category
10 Kitchen (was in 4 different rows)
11 Kitchen (was in 3 different rows)
DATA:
pk custID OrderNum SKU Category
1 10 1 JET123 Kitchen
2 10 1 CD3 Office
3 10 1 ABU7834 Kitchen
4 10 2 JET123 Kitchen
5 10 2 RBT134 Janitorial
6 10 2 JET123 Kitchen
7 10 3 CD3 Office
8 11 4 ET12 Kitchen
9 11 4 D3 Office
10 11 4 AU834 Kitchen
11 11 5 JE13 Kitchen
12 11 5 RT34 Janitorial
13 11 5 eT12 Disposables
14 11 6 CD Office
I need a query that will select the 1 category that appears the most per customer ID. Please help.
Note: total # of unique skus is irrelevant.
Also, the category must be chosen based on the categories located in all orders placed by customer.
See sample of data below including intended result
intended result:
custID Category
10 Kitchen (was in 4 different rows)
11 Kitchen (was in 3 different rows)
DATA:
pk custID OrderNum SKU Category
1 10 1 JET123 Kitchen
2 10 1 CD3 Office
3 10 1 ABU7834 Kitchen
4 10 2 JET123 Kitchen
5 10 2 RBT134 Janitorial
6 10 2 JET123 Kitchen
7 10 3 CD3 Office
8 11 4 ET12 Kitchen
9 11 4 D3 Office
10 11 4 AU834 Kitchen
11 11 5 JE13 Kitchen
12 11 5 RT34 Janitorial
13 11 5 eT12 Disposables
14 11 6 CD Office
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hey kenhaly,
thank you kindly for the compliment.
PART 1:
With respect to your comment, that was exactly what I was going to ask as a follow up to my initial question. The rule should be as follows:
If there is a tie, then select the category that has the highest cumulative revenue.
OrderedProductPrice is a field in the db that takes (quantity purchased X product price) for each sku. I would need to select a category where the cumulative orderproductprice revenue of all skus in the category was the greatest.
PART 2:
In regards to your great solution using the temp table, there are only 5 different categories that the result set can display (Office, Janitorial, Kitchen, Tabletop, or Disposables). Table1, however, contains other categories (Admiral Craft, Libbey, and Rubbermaid) as well. For every sku that is associated with the following category, I need it to be changed (before any other calculations begin):
Admiral Kraft --> Kitchen
Libbey --> Kitchen
Rubbermaid --> Janitorial
Thanks
thank you kindly for the compliment.
PART 1:
With respect to your comment, that was exactly what I was going to ask as a follow up to my initial question. The rule should be as follows:
If there is a tie, then select the category that has the highest cumulative revenue.
OrderedProductPrice is a field in the db that takes (quantity purchased X product price) for each sku. I would need to select a category where the cumulative orderproductprice revenue of all skus in the category was the greatest.
PART 2:
In regards to your great solution using the temp table, there are only 5 different categories that the result set can display (Office, Janitorial, Kitchen, Tabletop, or Disposables). Table1, however, contains other categories (Admiral Craft, Libbey, and Rubbermaid) as well. For every sku that is associated with the following category, I need it to be changed (before any other calculations begin):
Admiral Kraft --> Kitchen
Libbey --> Kitchen
Rubbermaid --> Janitorial
Thanks
ASKER
Disregard part 2. thanks
Part 1 answer:
That makes it interesting. (actually, it was already interesting!) I would create an artificial field called, say "score" which is computed to be (ct * 1,000,000) + revenue. So if someone has 3 Kitchen items with a total revenue of $20,000, that would create oa score of 3,020,000, and 3 Disposable items with a total revenue of $25,000 that would create a score of 3,025,000. Then, record score in the temp table, instead of ct, and do the same thing that we did with ct before. (The number you mulitiply by in the formula -- 1,000,000 in this case -- has to be larger than the highest possible revenue.)
Good luck with it.
That makes it interesting. (actually, it was already interesting!) I would create an artificial field called, say "score" which is computed to be (ct * 1,000,000) + revenue. So if someone has 3 Kitchen items with a total revenue of $20,000, that would create oa score of 3,020,000, and 3 Disposable items with a total revenue of $25,000 that would create a score of 3,025,000. Then, record score in the temp table, instead of ct, and do the same thing that we did with ct before. (The number you mulitiply by in the formula -- 1,000,000 in this case -- has to be larger than the highest possible revenue.)
Good luck with it.
ASKER
thanks guys!
Note that this solution may give more than one answer per customer if there's a tie for the maximum number of rows that 2 or more different categories are found for that customer.
E.g, if cust 11 ordered another office item instead of Janitorial, the results would be
11 Kitchen
11 Office
10 Kitchen
Please say if the query should just select one of these.
By the way, restockit, I meant to say "thank you" for a very clearly and simply posed question, complete with sample data and expected results. Makes it easy to create a test case for developing a solution.