[2 days left] What’s wrong with your cloud strategy? Learn why multicloud solutions matter with Nimble Storage.Register Now

x
?
Solved

top N (aggregate) for each group by

Posted on 2004-04-30
9
Medium Priority
?
559 Views
Last Modified: 2008-03-03
I've seen the question how to limit results on an inner group by (similar to mysql's limit n), however this problem is bit different please read on..

select group1, group2, count(*) cnt
from stuff
group by group1, group2
order by group1, group2, cnt desc

I want to limit the results to only the top 2 in group2 (according to the aggregate).
(group1,group2,...)
a,a
a,a
a,a
a,b
a,b
a,b
a,b
a,c
a,c
a,c
a,c
a,c
b,e
b,e
b,e
b,f
b,f
b,g
b,g
b,g
b,g
b,g

results should be:
a,c,5
a,b,4
b,g,5
b,e,3

0
Comment
Question by:gfody
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
9 Comments
 
LVL 13

Expert Comment

by:MikeWalsh
ID: 10967112
I am a little confused about your request, can you post a little more information.. Top 2 by number of occurences or something else?
0
 
LVL 2

Author Comment

by:gfody
ID: 10969020
I'm making a report from a log table.. here's a better example

time, deploymentid, domain, actionid
2/2/2004 1:23am, 1, aol.com, 1
2/2/2004 4:37am, 2, aol.com, 1
2/2/2004 5:11am, 3, aol.com, 2
2/3/2004 12:03pm, 3, yahoo.com, 1
2/3/2004 2:43pm, 3, msn.com, 1
2/3/2004 2:50pm, 2, aol.com, 2

I like to give the option to group by time, deployment, and domain. for instance.. deployment by domain would be this query: (with some example aggregates I may use)

select
deploymentid,
domain,
sum(case actionid when in (1) then 1 else 0 end) a1
group by deploymentid,domain
order by a1 desc

the same report, grouped by domain and then deployment would be:
select
domain,
deploymentid,
sum(case actionid when in (1) then 1 else 0 end) a1
group by domain,deploymentid
order by a1 desc

in either case, the domain column contains upwards of 30,000 unique values.. so the report gets cluttered with tons of insignificant data. what the customer is really interested in is the top N domains (top depends on the aggregate).. in the example above the top 2 domains would be the domains with the most rows with where actionid=1
0
 
LVL 2

Author Comment

by:gfody
ID: 10969025
oops, that is:
sum(case when actionid in (1) then 1 else 0 end)
0
Will your db performance match your db growth?

In Percona’s white paper “Performance at Scale: Keeping Your Database on Its Toes,” we take a high-level approach to what you need to think about when planning for database scalability.

 
LVL 1

Expert Comment

by:iyerbhuvanac
ID: 10975729
Paste the following in your query analyser with appropriate fieldnames and tablenames and execute:




select group1, group2, count(*) cnt
into temp2 from stuff
group by group1, group2
order by group1, group2,cnt  asc

select a.group1, max(a.cnt) cnt into temp3
from temp2 a, temp2 b
where a.cnt < (Select max(b.cnt) from temp2 b)
group by a.group1

UNION

select a.group1, max(a.cnt) cnt
from temp2 a
inner join temp2 b
on a.group1 = b.group1
group by a.group1


select a.group1,a.group2,a.cnt from temp2 a inner join temp3 b
on a.group1 = b.group1 and a.cnt = b.cnt

drop table temp2
drop table temp3
0
 
LVL 2

Author Comment

by:gfody
ID: 10979434
thanks for the reply iyerbhuvanac, your solution does return only the top 2, however I do need to limit to an arbitrary number of results per group. top 2 was just for my example.. the actual options are top 5, 20, 50, 100.
0
 
LVL 1

Expert Comment

by:iyerbhuvanac
ID: 10983642
HERE is a proc which is going to help you. You just have to pass the parameter ie the top number of rows you want to retrieve.say,5 or 10 or 15 or 20,etc

create this in your database  and execute with the required params.

This is the base version. I shall try to optimize this and post it as and when time permits
 

create procedure Getmax @param int
AS
declare
@count int
begin
set @count = 1

select group1, group2, count(*) cnt
into temp2 from stuff
group by group1, group2
order by group1, group2,cnt  asc


while @count <= @param
begin

select a.group1, max(a.cnt) cnt into temp3
from temp2 a
inner join temp2 b
on a.group1 = b.group1
group by a.group1


select a.group1,a.group2,a.cnt cnt into temp5 from temp2 a inner join temp3 b
on a.group1 = b.group1 and a.cnt = b.cnt

select * from temp5

delete from temp2   where temp2.group1 in (select group1 from temp5)
and temp2.group2 in (select group2 from temp5)
and temp2.cnt in (select cnt from temp5)

set @count = @count + 1


drop table temp3
drop table temp5
end

drop table temp2

end


0
 
LVL 2

Author Comment

by:gfody
ID: 11025119
I appreciate your effort, but I really needed something less hackerish (read scalable)

I came up with this method using a temp table..

select g1,g2,c1,c2
group by g1,g2
into #temp

create clustered index on #temp (g1,c1 desc) -- important!
select g1,g2,c1,c2 from temp t
where g2 in (select top 10 g2 from #temp where g1=t.g1 order by c1 desc)

thanks for everyones help
0
 
LVL 5

Accepted Solution

by:
Netminder earned 0 total points
ID: 11589887
Closed, 500 points refunded.
Netminder
Site Admin
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What if you have to shut down the entire Citrix infrastructure for hardware maintenance, software upgrades or "the unknown"? I developed this plan for "the unknown" and hope that it helps you as well. This article explains how to properly shut down …
This month, Experts Exchange sat down with resident SQL expert, Jim Horn, for an in-depth look into the makings of a successful career in SQL.
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question