Hi,

I've done a bunch of web searching on this question, and among others, found the solution below. I want to covert the approach described below into a more versatile UDF or stored procedure for calculating percentile ranks in lots of different tables and views in a database quickly and easily:

http://www.sqlteam.com/article/computing-percentiles-in-sql-server
Basically, in this approach you define a UDF for linear interpolation called LERP; syntax in this application would be LERP(given_value,min_value

,max_value,output_min,output_high).

The function is defined as:

_______________________________________

declare @pp float

set @pp = .5

declare @k int, @d float, @ax float, @bx float

declare @values table (i int identity(1,1), x float)

insert @values select Score from @TestScores order by Score

select @k=floor(kf), @d=kf-floor(kf)

from (select 1+@pp*(count(*)-1) as kf from @values) as x1

select @ax=x from @values where i=@k

select @bx=x from @values where i=@k+1

select @pp as factor,

dbo.LERP(@d, 0.0, 1.0, @ax, @bx) as percentile}

_______________________________________

They then apply the function over this generated data set:

_______________________________________

Declare @TestScores table (StudentID int identity(1,1), Score int)

insert @TestScores (Score) Values (20)

insert @TestScores (Score) Values (03)

insert @TestScores (Score) Values (40)

insert @TestScores (Score) Values (45)

insert @TestScores (Score) Values (50)

insert @TestScores (Score) Values (20)

insert @TestScores (Score) Values (90)

insert @TestScores (Score) Values (20)

insert @TestScores (Score) Values (11)

insert @TestScores (Score) Values (30)

while @@ROWCOUNT > 0

insert @TestScores select t.Score from @TestScores t

cross join

(select max (StudentID) MaxRowNum from @TestScores) x

where

t.StudentID <= 100 - x.MaxRowNum

_______________________________________

In my application, I have a large list of people and scores. I basically want what I have in excel-- a relatively easy way to apply this function automatically (perhaps using dynamic SQL?) to a given SELECT statement. For example, suppose I had this data in a table called Results:

Person Score

Bill 45

Fred 112

Jim 23

Rick 200

Bob 76

Andy 23

Roger 39

Ideally I would like to be able to do something like the following:

SELECT r.Person, r.Score, PERCENTILERANK(r.Score) AS 'Percentile Rank'

FROM Results r

WHERE r.Person not like 'R%';

This would produce:

Person Score Percent

Bill 45 0.500

Fred 112 1.000

Jim 23 0.000

Bob 76 0.750

Andy 23 0.000

As I said, this would be ideal. If this is simply not technically feasible in SQL Server 2008, then I think the next best thing would be a stored procedure that could be very quickly modified each time to compute the percentile ranking for a given table, perhaps populating a new column using INTO or something. I have read just about everything on the web about this, and I can't help but think there is a better way out there for this incredibly common and useful task. I keep thinking there is a way to use NTILE and PARTITION to do what I want, but I can't figure it out. I am bewildered why this functionality is not embedded in T-SQL in 2008.

I suspect there is also a clever way to do this using a data cube in Analysis Services, but Analysis Services is a big complicated piece of software and I'd rather stick to vanilla T-SQL.

I leave it to the wizards out there to show the way. Thanks for your help!

Open in new window