SQL Query - Ranking / Distribution


SQL Server 2008 R2.  I have a table with 3 relevant columns:

[FileName], [FileSize], [Group]

I'd like break up the recordset up into groups based on the even distribution of [FileSize];

If I decided to break the recordset into 5 groups I would set [Group] to either '1','2','3','4' or '5' and I would expect the SUM([FileSize]) for each group to be similar.

I can think of some round-about ways of accomplishing this but want to know if there is an elegant way to do so.

Who is Participating?
appariConnect With a Mentor Commented:
try this
not efficient code but seems working
declare @mytab table([FileName] varchar(20),[FileSize] int,[Group] int)
insert into @mytab 
Select 'mypicture.jpg', 200, NULL
union Select 'mypicture2.jpg', 300, NULL
union Select 'mydoc.doc',500, NULL
union Select 'myexcelsheet.xls', 1000, NULL
union Select 'mytextfile.txt',600, NULL
union Select 'myslide.ppt',400, NULL
union Select 'myaudio.mp3', 1100, NULL
union Select 'myfile.txt', 100, NULL
union Select 'mysite.html', 400, NULL
union Select 'mypicture3', 550, NULL

declare @avgSize int
Select @avgSize = sum(filesize)/5 from @mytab
Select @avgSize 
declare @var decimal(4,2)
select @var=0
while exists(Select 1 from @mytab where [group] is null)
declare @firstGroup int
select @firstGroup =0
;with a as (Select [group], sum(filesize) sumSize from @mytab
where  [group] is not null
group by [group] having sum(filesize) >= @avgsize )
select @firstGroup = isnull(max([group]),0) from a

;with dat as (select row_number() over(order by filesize desc)+@firstGroup rownum,
* from @mytab where [group] is null )
update dat
set [group]= rownum
where rownum<=5
and (filesize + isnull((Select sum(filesize) from @mytab where [group]=rownum),0)<(@avgsize + @avgsize *@var)
or filesize>=@avgsize)
if @@rowcount =0 
select @var= @var+0.1
--select * from @mytab order by 3
--select * from @mytab order by 3
select [Group], sum(filesize) from @mytab 
group by [Group]

Open in new window

can you post sample data and the output you are looking for?
StrangerDangerAuthor Commented:
Sure... to re-iterate... I essentially I want to assign a group number to each row.  When you sum the filesizes for each group, I'd like the totals to be relatively similar.


mypicture.jpg, 200, NULL
mypicture2.jpg, 300, NULL
mydoc.doc,500, NULL
myexcelsheet.xls, 1000, NULL
mytextfile.txt,600, NULL
myslide.ppt,400, NULL
myaudio.mp3, 1100, NULL
myfile.txt, 100, NULL
mysite.html, 400, NULL
mypicture3, 550, NULL


mypicture.jpg, 200, 1
mypicture2.jpg, 300,1
myexcelsheet.xls, 1000, 2
myaudio.mp3, 1100,4
myfile.txt, 100, 5
mysite.html, 400, 5
mypicture3, 550, 5

Example aggregate:
[Group], [SumOfFileSize]
1, 1000
2, 1000
3, 1000
4, 1100
5, 950
StrangerDangerAuthor Commented:
@appari - This helped.  Thanks mate.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.