Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
Solved

TOP n records pr group?

Posted on 2011-09-22
Medium Priority
245 Views
Hello EE!

I've tried following a guide at
http://allenbrowne.com/subquery-01.html#TopN

in order to give me the TOP n records by group, but I cannot seem to get it to work.

I got a database containing categories (RVG), buyers (LNR), and amounts (KP)

What I'm interested in getting is the sum of the amounts (KP) of the top 5 buyers pr category. So far I've got this:

``````SELECT RAWDATA.RVG, Sum(RAWDATA.KP), RAWDATA.LNR
FROM RAWDATA
WHERE RAWDATA.LNR IN
(SELECT TOP 5 LNR
FROM RAWDATA AS Dupe
WHERE Dupe.RVG = RAWDATA.RVG
ORDER BY Sum(Dupe.KP) DESC, Dupe.LNR DESC)
ORDER BY RAWDATA.RVG, Sum(RAWDATA.KP), RAWDATA.LNR;
``````

I'm getting the error that RVG is not part of the aggregate function?

Secondly, is it possible to also do a sub query that returns the total spend for that category?

So that I would get:

RVG, LNR, Sum of KP (top 5), Sum of KP (All)?

0
Question by:ThomasFoege
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

• Help others & share knowledge
• Earn cash & points
• 11
• 11

LVL 61

Expert Comment

ID: 36579663
You need to use a GROUP BY clause:

SELECT RAWDATA.RVG, Sum(RAWDATA.KP), RAWDATA.LNR
FROM RAWDATA
WHERE RAWDATA.LNR IN
(SELECT TOP 5 LNR
FROM RAWDATA AS Dupe
WHERE Dupe.RVG = RAWDATA.RVG
ORDER BY Sum(Dupe.KP) DESC, Dupe.LNR DESC)
GROUP BY RAWDATA.RVG,  RAWDATA.LNR;
0

Author Comment

ID: 36579671

Doing that i get LNR is not part of the aggregate function?
0

LVL 61

Expert Comment

ID: 36579676
That will give you a query that sums amount per category and buyer.  However if you need to sum all of the buyers' amounts per group (ie: total amount per group_, you would need to drop the RAWDATA.LNR from your outer query:

SELECT RAWDATA.RVG, Sum(RAWDATA.KP)
FROM RAWDATA
WHERE RAWDATA.LNR IN
(SELECT TOP 5 LNR
FROM RAWDATA AS Dupe
WHERE Dupe.RVG = RAWDATA.RVG
ORDER BY Sum(Dupe.KP) DESC, Dupe.LNR DESC)
GROUP BY RAWDATA.RVG;
0

Author Comment

ID: 36579682
I'm trying to get the sum of the amounts of the top 5 buyers pr category :)
0

LVL 61

Expert Comment

ID: 36579691
Actually... The SUM should not be in the inner query.  Try this:

SELECT RAWDATA.RVG, Sum(RAWDATA.KP)
FROM RAWDATA
WHERE RAWDATA.LNR IN
(SELECT TOP 5 LNR
FROM RAWDATA AS Dupe
WHERE Dupe.RVG = RAWDATA.RVG
ORDER BY Dupe.KP DESC, Dupe.LNR DESC
GROUP BY RAWDATA.RVG;
0

LVL 61

Expert Comment

ID: 36579698
Missed the close parenthesis on the order by.  It should be this:

SELECT RAWDATA.RVG, Sum(RAWDATA.KP)
FROM RAWDATA
WHERE RAWDATA.LNR IN
(SELECT TOP 5 LNR
FROM RAWDATA AS Dupe
WHERE Dupe.RVG = RAWDATA.RVG
ORDER BY Dupe.KP DESC, Dupe.LNR DESC)
GROUP BY RAWDATA.RVG;
0

Author Comment

ID: 36579702
It's running, calculating is taking forever ;)
0

Author Comment

ID: 36579711
Theres about 20 different RVG and maybe 60.000 lines across 50 different LNR. Does it sound probable that it's still running query after 10 minutes?
0

LVL 61

Expert Comment

ID: 36579716
I think you are also confusing Amount (KP) with Customers (LNR).

For a sum of the top 5 Per group based on amount (KP):

SELECT RAWDATA.RVG, Sum(RAWDATA.KP)
FROM RAWDATA
WHERE RAWDATA.KP IN
(SELECT TOP 5 KP
FROM RAWDATA AS Dupe
WHERE Dupe.RVG = RAWDATA.RVG
ORDER BY Dupe.KP DESC)
GROUP BY RAWDATA.RVG;
0

LVL 61

Expert Comment

ID: 36579734
<< Does it sound probable that it's still running query after 10 minutes? >>

It *seems* unlikely.  I think the query is still off (see above).  Try my last post and see if that helps.
0

Author Comment

ID: 36579738
RVG is a Category, such as Books
LNR is a buyer such as Person A
KP is the amount columns

LNR can buy books maybe 20 or 100 times in the dataset

What I'm hoping to get is an output like this:
Category, Top5 Spend, All Spend
Books, 10\$, 50\$
Food, 5\$, 5\$

Then i know that of the top 5 buyers, they buy 10\$ out of 50\$ and in food all spend is within top 5

I might have confused them in my query
0

Author Comment

ID: 36579770
``````SELECT RAWDATA.RVG, Sum(RAWDATA.KP)
FROM RAWDATA
WHERE RAWDATA.KP IN
(SELECT TOP 5 KP
FROM RAWDATA AS Dupe
WHERE Dupe.RVG = RAWDATA.RVG
ORDER BY Dupe.KP DESC)
GROUP BY RAWDATA.RVG;
``````

Keeps running forever to :/

I really hope this is possible to do within a single "query" so I do not have to create pivots etc.
0

LVL 61

Expert Comment

ID: 36579774
Give that last query a shot on a sample set of data and see if it gives you the expected results.

If that doesn't work, it might help to post a sample database (no sensitive information, please) and specify what your expected results are.
0

LVL 61

Accepted Solution

mbizup earned 2000 total points
ID: 36579784
<< Keeps running forever to :/  >>

I'm fairly confident about the syntax on that one.

Maybe try it on a smaller set of data to verify the results...
0

Author Comment

ID: 36579812
It did work with a smaller dataset, however does that query not only return the spend of the top 5 buyers divided out pr category?

What I'm hoping to get is for each category, what is the spend within that category of the top 5 suppliers within that category. So there will be a top 5 for each category (RVG)

Or did i miss something
0

LVL 61

Expert Comment

ID: 36579869
Okay I may have misunderstood your requirements-

It sounds like you might need to run a couple of queries... and this will likely be a very time consuming query...

1.  You need to first obtain a SUM (total) of all of the consumers' spending in each group to give you these results (one row per customer per group).

2.  Based on those results, you need to obtain a top 5 "Total" amounts per category

Does that sound right?

Can you post a sample database with just the RAWDATA table containing enough rows of data to illustrate your needs?  (and also post what your expected results are)
0

Author Comment

ID: 36579970
Time consuming is fine, currently its way more time consuming doing alot of pivots and vlookups ;)

Thats correct! Some consumers may be in several categories, and could in theory be in top 5 in more categories, but their spend is only counted toward that top 5 spot within the category

An example of data is included. It's made up, but represents exactly the same kind of data i got

The desired output is as follows:

RVG - Spend within RVG made up of the Top 5 LNR within that RVG - Total spend within that RVG

So if the dataset was only
ID, RVG, LNR, Amount
1, Books, Person A, 10
2, Books, Person B, 100
3, Books, Person C, 30
4, Books, Person D, 50
5, Books, Person E, 200
6, Books, Person F, 200

7, Food, Person G, 1000

The output would be:
RVG - Spend in Top 5 - All Spend
Books - 580 - 590
Food  1000 - 1000

Thank you again for using your time to help me!

RAWDATA-Example.xlsx
0

Author Comment

ID: 36580151
Hey, i played around with your code

``````SELECT RAWDATA2.RAA_VARE_GRUPPE, Sum(RAWDATA2.KOEBSPRIS) AS SumOfKOEBSPRIS
FROM RAWDATA2
WHERE (((RAWDATA2.LEVERANDOERNR) In (SELECT TOP 5 LEVERANDOERNR
FROM RAWDATA2 AS Dupe
WHERE Dupe.RAA_VARE_GRUPPE = RAWDATA2.RAA_VARE_GRUPPE
GROUP BY LEVERANDOERNR
ORDER BY Sum(Dupe.KOEBSPRIS) DESC)))
GROUP BY RAWDATA2.RAA_VARE_GRUPPE;
``````

seems to work! awesome!
0

LVL 61

Expert Comment

ID: 36580247
Hi - Just got back...

Are you all set then?
0

Author Comment

ID: 36580274
It's very very slow though. Would it make sense to make a tempoary table or something like that to lower processing time?
0

LVL 61

Expert Comment

ID: 36580367
<< Would it make sense to make a tempoary table or something like that to lower processing time >>

I have used a two-step approach like that effectively to improve performance on some large queries.

I believe th IN clause with the SUM in it might be what is causing you grief....

You *might* be able to improve speed by creating a query that sums all of the amounts per customer, per group, and turning that into a Make Table query.  The run a SELECT 5 by group on the resulting table - which would be a much smaller dataset...
0

Author Closing Comment

ID: 36930263
Worked perfectly!
0

Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

You need to know the location of the Office templates folder, so that when you create new templates, they are saved to that location, and thus are available for selection when creating new documents.  The steps to find the Templates folder path are â€¦
We live in a world of interfaces like the one in the title picture. VBA also allows to use interfaces which offers a lot of possibilities. This article describes how to use interfaces in VBA and how to work around their bugs.
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final pâ€¦
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filledâ€¦
Suggested Courses
Course of the Month4 days, 10 hours left to enroll