Solved

TOP n records pr group?

Posted on 2011-09-22
22
236 Views
Last Modified: 2012-05-12
Hello EE!

I've tried following a guide at
http://allenbrowne.com/subquery-01.html#TopN

in order to give me the TOP n records by group, but I cannot seem to get it to work.

I got a database containing categories (RVG), buyers (LNR), and amounts (KP)

What I'm interested in getting is the sum of the amounts (KP) of the top 5 buyers pr category. So far I've got this:


SELECT RAWDATA.RVG, Sum(RAWDATA.KP), RAWDATA.LNR
FROM RAWDATA
WHERE RAWDATA.LNR IN
   (SELECT TOP 5 LNR                            
   FROM RAWDATA AS Dupe                              
   WHERE Dupe.RVG = RAWDATA.RVG        
   ORDER BY Sum(Dupe.KP) DESC, Dupe.LNR DESC)
ORDER BY RAWDATA.RVG, Sum(RAWDATA.KP), RAWDATA.LNR;

Open in new window



I'm getting the error that RVG is not part of the aggregate function?

Secondly, is it possible to also do a sub query that returns the total spend for that category?

So that I would get:

RVG, LNR, Sum of KP (top 5), Sum of KP (All)?

Thanks in advance!
0
Comment
Question by:ThomasFoege
  • 11
  • 11
22 Comments
 
LVL 61

Expert Comment

by:mbizup
ID: 36579663
You need to use a GROUP BY clause:

SELECT RAWDATA.RVG, Sum(RAWDATA.KP), RAWDATA.LNR
FROM RAWDATA
WHERE RAWDATA.LNR IN
   (SELECT TOP 5 LNR                            
   FROM RAWDATA AS Dupe                              
   WHERE Dupe.RVG = RAWDATA.RVG        
   ORDER BY Sum(Dupe.KP) DESC, Dupe.LNR DESC)
GROUP BY RAWDATA.RVG,  RAWDATA.LNR;
0
 

Author Comment

by:ThomasFoege
ID: 36579671
Hey! Thnks for the reply

Doing that i get LNR is not part of the aggregate function?
0
 
LVL 61

Expert Comment

by:mbizup
ID: 36579676
That will give you a query that sums amount per category and buyer.  However if you need to sum all of the buyers' amounts per group (ie: total amount per group_, you would need to drop the RAWDATA.LNR from your outer query:

SELECT RAWDATA.RVG, Sum(RAWDATA.KP)
FROM RAWDATA
WHERE RAWDATA.LNR IN
   (SELECT TOP 5 LNR                            
   FROM RAWDATA AS Dupe                              
   WHERE Dupe.RVG = RAWDATA.RVG        
   ORDER BY Sum(Dupe.KP) DESC, Dupe.LNR DESC)
GROUP BY RAWDATA.RVG;
0
 

Author Comment

by:ThomasFoege
ID: 36579682
I'm trying to get the sum of the amounts of the top 5 buyers pr category :)
0
 
LVL 61

Expert Comment

by:mbizup
ID: 36579691
Actually... The SUM should not be in the inner query.  Try this:


SELECT RAWDATA.RVG, Sum(RAWDATA.KP)
FROM RAWDATA
WHERE RAWDATA.LNR IN
   (SELECT TOP 5 LNR                            
   FROM RAWDATA AS Dupe                              
   WHERE Dupe.RVG = RAWDATA.RVG        
   ORDER BY Dupe.KP DESC, Dupe.LNR DESC
GROUP BY RAWDATA.RVG;
0
 
LVL 61

Expert Comment

by:mbizup
ID: 36579698
Missed the close parenthesis on the order by.  It should be this:

SELECT RAWDATA.RVG, Sum(RAWDATA.KP)
FROM RAWDATA
WHERE RAWDATA.LNR IN
   (SELECT TOP 5 LNR                            
   FROM RAWDATA AS Dupe                              
   WHERE Dupe.RVG = RAWDATA.RVG        
   ORDER BY Dupe.KP DESC, Dupe.LNR DESC)
GROUP BY RAWDATA.RVG;
0
 

Author Comment

by:ThomasFoege
ID: 36579702
It's running, calculating is taking forever ;)
0
 

Author Comment

by:ThomasFoege
ID: 36579711
Theres about 20 different RVG and maybe 60.000 lines across 50 different LNR. Does it sound probable that it's still running query after 10 minutes?
0
 
LVL 61

Expert Comment

by:mbizup
ID: 36579716
I think you are also confusing Amount (KP) with Customers (LNR).

For a sum of the top 5 Per group based on amount (KP):

SELECT RAWDATA.RVG, Sum(RAWDATA.KP)
FROM RAWDATA
WHERE RAWDATA.KP IN
   (SELECT TOP 5 KP                          
   FROM RAWDATA AS Dupe                              
   WHERE Dupe.RVG = RAWDATA.RVG        
   ORDER BY Dupe.KP DESC)
GROUP BY RAWDATA.RVG;
0
 
LVL 61

Expert Comment

by:mbizup
ID: 36579734
<< Does it sound probable that it's still running query after 10 minutes? >>

It *seems* unlikely.  I think the query is still off (see above).  Try my last post and see if that helps.
0
 

Author Comment

by:ThomasFoege
ID: 36579738
RVG is a Category, such as Books
LNR is a buyer such as Person A
KP is the amount columns


LNR can buy books maybe 20 or 100 times in the dataset


What I'm hoping to get is an output like this:
Category, Top5 Spend, All Spend
Books, 10$, 50$
Food, 5$, 5$

Then i know that of the top 5 buyers, they buy 10$ out of 50$ and in food all spend is within top 5

I might have confused them in my query
0
Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

 

Author Comment

by:ThomasFoege
ID: 36579770
SELECT RAWDATA.RVG, Sum(RAWDATA.KP)
FROM RAWDATA
WHERE RAWDATA.KP IN
   (SELECT TOP 5 KP                          
   FROM RAWDATA AS Dupe                              
   WHERE Dupe.RVG = RAWDATA.RVG        
   ORDER BY Dupe.KP DESC)
GROUP BY RAWDATA.RVG; 

Open in new window


Keeps running forever to :/

I really hope this is possible to do within a single "query" so I do not have to create pivots etc.
0
 
LVL 61

Expert Comment

by:mbizup
ID: 36579774
Give that last query a shot on a sample set of data and see if it gives you the expected results.

If that doesn't work, it might help to post a sample database (no sensitive information, please) and specify what your expected results are.
0
 
LVL 61

Accepted Solution

by:
mbizup earned 500 total points
ID: 36579784
<< Keeps running forever to :/  >>

I'm fairly confident about the syntax on that one.

Maybe try it on a smaller set of data to verify the results...
0
 

Author Comment

by:ThomasFoege
ID: 36579812
It did work with a smaller dataset, however does that query not only return the spend of the top 5 buyers divided out pr category?

What I'm hoping to get is for each category, what is the spend within that category of the top 5 suppliers within that category. So there will be a top 5 for each category (RVG)

Or did i miss something
0
 
LVL 61

Expert Comment

by:mbizup
ID: 36579869
Okay I may have misunderstood your requirements-

It sounds like you might need to run a couple of queries... and this will likely be a very time consuming query...

1.  You need to first obtain a SUM (total) of all of the consumers' spending in each group to give you these results (one row per customer per group).

2.  Based on those results, you need to obtain a top 5 "Total" amounts per category

Does that sound right?

Can you post a sample database with just the RAWDATA table containing enough rows of data to illustrate your needs?  (and also post what your expected results are)
0
 

Author Comment

by:ThomasFoege
ID: 36579970
Time consuming is fine, currently its way more time consuming doing alot of pivots and vlookups ;)

Thats correct! Some consumers may be in several categories, and could in theory be in top 5 in more categories, but their spend is only counted toward that top 5 spot within the category

An example of data is included. It's made up, but represents exactly the same kind of data i got

The desired output is as follows:

RVG - Spend within RVG made up of the Top 5 LNR within that RVG - Total spend within that RVG


So if the dataset was only
ID, RVG, LNR, Amount
1, Books, Person A, 10
2, Books, Person B, 100
3, Books, Person C, 30
4, Books, Person D, 50
5, Books, Person E, 200
6, Books, Person F, 200

7, Food, Person G, 1000

The output would be:
RVG - Spend in Top 5 - All Spend
Books - 580 - 590
Food  1000 - 1000


Thank you again for using your time to help me!

RAWDATA-Example.xlsx
0
 

Author Comment

by:ThomasFoege
ID: 36580151
Hey, i played around with your code

SELECT RAWDATA2.RAA_VARE_GRUPPE, Sum(RAWDATA2.KOEBSPRIS) AS SumOfKOEBSPRIS
FROM RAWDATA2
WHERE (((RAWDATA2.LEVERANDOERNR) In (SELECT TOP 5 LEVERANDOERNR                          
   FROM RAWDATA2 AS Dupe                              
   WHERE Dupe.RAA_VARE_GRUPPE = RAWDATA2.RAA_VARE_GRUPPE
   GROUP BY LEVERANDOERNR        
   ORDER BY Sum(Dupe.KOEBSPRIS) DESC)))
GROUP BY RAWDATA2.RAA_VARE_GRUPPE;

Open in new window


seems to work! awesome!
0
 
LVL 61

Expert Comment

by:mbizup
ID: 36580247
Hi - Just got back...

Are you all set then?
0
 

Author Comment

by:ThomasFoege
ID: 36580274
It's very very slow though. Would it make sense to make a tempoary table or something like that to lower processing time?
0
 
LVL 61

Expert Comment

by:mbizup
ID: 36580367
<< Would it make sense to make a tempoary table or something like that to lower processing time >>

I have used a two-step approach like that effectively to improve performance on some large queries.

I believe th IN clause with the SUM in it might be what is causing you grief....

You *might* be able to improve speed by creating a query that sums all of the amounts per customer, per group, and turning that into a Make Table query.  The run a SELECT 5 by group on the resulting table - which would be a much smaller dataset...
0
 

Author Closing Comment

by:ThomasFoege
ID: 36930263
Worked perfectly!
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

When you are entering numbers in a speadsheet, and don't remember what 6×7 is, you just type “=6*7" instead. It works in every cell! This is not so in Access. To enter the elusive 42 in a text box, you have to find a calculator, and then copy the re…
In a multiple monitor setup, if you don't want to use AutoCenter to position your popup forms, you have a problem: where will they appear?  Sometimes you may have an additional problem: where the devil did they go?  If you last had a popup form open…
Familiarize people with the process of utilizing SQL Server stored procedures from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Micr…
Basics of query design. Shows you how to construct a simple query by adding tables, perform joins, defining output columns, perform sorting, and apply criteria.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now