Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
• Status: Solved
• Priority: Medium
• Security: Public
• Views: 4157

# Best Implementation of Excel's PERCENTRANK function in SQL Server 2008

Hi,

I've done a bunch of web searching on this question, and among others, found the solution below. I want to covert the approach described below into a more versatile UDF or stored procedure for calculating percentile ranks in lots of different tables and views in a database quickly and easily:

http://www.sqlteam.com/article/computing-percentiles-in-sql-server

Basically, in this approach you define a UDF for linear interpolation called LERP; syntax in this application would be LERP(given_value,min_value,max_value,output_min,output_high).

The function is defined as:
_______________________________________

declare @pp float
set @pp = .5

declare @k int, @d float, @ax float, @bx float
declare @values table (i int identity(1,1), x float)

insert @values select Score from @TestScores order by Score

select @k=floor(kf), @d=kf-floor(kf)
from (select 1+@pp*(count(*)-1) as kf from @values) as x1

select @ax=x from @values where i=@k
select @bx=x from @values where i=@k+1

select @pp as factor,
dbo.LERP(@d, 0.0, 1.0, @ax, @bx) as percentile}
_______________________________________

They then apply the function over this generated data set:
_______________________________________

Declare @TestScores table (StudentID int identity(1,1), Score int)
insert @TestScores (Score) Values (20)
insert @TestScores (Score) Values (03)
insert @TestScores (Score) Values (40)
insert @TestScores (Score) Values (45)
insert @TestScores (Score) Values (50)
insert @TestScores (Score) Values (20)
insert @TestScores (Score) Values (90)
insert @TestScores (Score) Values (20)
insert @TestScores (Score) Values (11)
insert @TestScores (Score) Values (30)
while @@ROWCOUNT > 0
insert @TestScores select t.Score from @TestScores t
cross join
(select max (StudentID) MaxRowNum from @TestScores) x
where
t.StudentID <= 100 - x.MaxRowNum
_______________________________________

In my application, I have a large list of people and scores. I basically want what I have in excel-- a relatively easy way to apply this function automatically (perhaps using dynamic SQL?) to a given SELECT statement. For example, suppose I had this data in a table called Results:

Person      Score
Bill                 45
Fred                 112
Jim                 23
Rick                 200
Bob                 76
Andy            23
Roger         39

Ideally I would like to be able to do something like the following:

SELECT r.Person, r.Score, PERCENTILERANK(r.Score) AS 'Percentile Rank'
FROM Results r
WHERE r.Person not like 'R%';

This would produce:

Person           Score            Percent
Bill                    45            0.500
Fred                      112            1.000
Jim                      23            0.000
Bob                      76            0.750
Andy                      23            0.000

As I said, this would be ideal. If this is simply not technically feasible in SQL Server 2008, then I think the next best thing would be a stored procedure that could be very quickly modified each time to compute the percentile ranking for a given table, perhaps populating a new column using INTO or something. I have read just about everything on the web about this, and I can't help but think there is a better way out there for this incredibly common and useful task. I keep thinking there is a way to use NTILE and PARTITION to do what I want, but I can't figure it out. I am bewildered why this functionality is not embedded in T-SQL in 2008.

I suspect there is also a clever way to do this using a data cube in Analysis Services, but Analysis Services is a big complicated piece of software and I'd rather stick to vanilla T-SQL.

I leave it to the wizards out there to show the way. Thanks for your help!

0
Dicklestein
• 9
• 9
• 5
2 Solutions

Billing EngineerCommented:

declare @t table (person varchar(10), score int )
set nocount on
insert into @t values('Bill         ',        45)
insert into @t values('Fred          ',       112)
insert into @t values('Jim          ',       23)
insert into @t values('Rick          ',       200)
insert into @t values('Bob         ',        76)
insert into @t values('Andy      ',      23)
insert into @t values('Roger     ',    39)
;with data as (
select t.*
, dense_rank() over (order by score asc) dr
from @t t
where person not like 'R%'
)
select data.*
, cast(cast(dr as decimal(10,2)) / ( select max(dr) from data ) as decimal(10,2)) percentile_rank
from data

0

Author Commented:
AngelIII,

Thanks very much for your prompt reply. Your solution is very nice and fast. But is there any way to wrap up the whole thing into a UDF that I can then use APPLY on? What I'm really looking for is a more modularized solution that is easy to re-use, and which can be easily put into a view. Thanks
0

Billing EngineerCommented:
eventually it would be possible with a .Net CLR function...
however, performance being a key factor, I prefer to rewrite/copy/paste some nice sql code over having to fiddle with .net code (although I like writing .net code, actually) :)
0

Author Commented:
OK, that makes sense. If you don't mind (not sure what the protocol is for this-- please correct me if I'm wrong), I'd like to keep the question open in hopes that someone will post a such a CLR function or other means of accomplishing my goal.

<email removed to avoid spamming>

Thanks, and I'll share any points with you and anyone who contributes more.
0

Billing EngineerCommented:

please check my profile, it is listed there.
angel eyes
0

The dense_rank is the best way as AngelIII has already shown (is he ever wrong ?),

but not sure why you want a CLR or even a UDF. Simply use a view :

create view vw_percentiles as
with data
as
( select t.*, dense_rank() over (order by score asc) dr
from ee_Scores t
where person not like 'R%'
)
select data.*
, cast(cast(dr as decimal(10,2)) / ( select max(dr) from data ) as decimal(10,2)) percentile_rank
from data
;

then you can do a:

select * from  mytable inner join vw_percentiles on mytable.person = vw_percentiles.person

or is there someother reason you want to use the cross apply ?

in which case could then use a function, but really depends on "why a function" and "why not a view"...

create function uGetPercentiles()
returns @tbl table (person varchar(10), score int, dr int, percentile decimal(10,5))
as
begin

insert @tbl(person,score,dr,percentile)
select * from vw_percentiles

return

end

All crisp clean SQL, no CLR, possibly no SP, just a humble (and often overlooked) view...
0

Author Commented:
It's really more of a matter of ease of use. I'd like to be able to just call it like you would an excel function, but it appears that SQL Server just doesn't work like that that. With the solution you guys give, you would have to re-type all of that stuff every time you ever want to calculate a percentage rank (which is for me constantly).

Anyway, since it can't be done easily the way I'm talking about, I will close the question and split the points. In fairness, AngelIII answered first with substantially the same response as mark, so he will get more of the points.

Thanks again for the help.
0

Thanks for the points, and just a tip, if you are unhappy with the answer, or doesn't seem to be what you want, or feel a need to query us, then please let us know... Only too happy to spend time to resolve any hanging issues...

Not sure why it has to be retyped every time. Once it is part of a view (for example), it is dynamically available - the view is already part of the database, and cannot think of anything easier than select * from vw_percentiles - certainly do not have to re-create. Or, was there something else that you think might have to be regenerated ?

Even though you have awarded points, would not mind finishing off the discussion...
0

Billing EngineerCommented:
mark,

the "problem" is that if you have several columns, not only score, and you want to apply the same ranking method, you have to copy/paste the function....

unless you want to implement an .net clr aggregation function :)
0

Author Commented:
Well, I understand that you could easily select from a view of percentile ranks calculated for a given quantity (say, Score in my example above). That's just a matter of creating a view as you describe in your post.

That's not what I'm talking about-- I want to be able to apply the percent rank function to dozens of different quantities, and I don't want to have to create a separate view for each of them, or even to have one view with a bunch of identical SQL statements, each of which re-specifies the way to calculate a percent rank using the DENSE RANK function, etc. etc.

For example, if you go back to my original sample data, suppose that instead of one set of scores, we had 20 of them-- score1 is a score on a written part of a test, score2 is the oral part, score3 is for the extra credit section, etc. So every student has a set of 20 scores. I want an easy way to calculate, in one view, the percentile rank for each student for each set of scores. That's why I think a CLR function would be super useful-- you could just write something like:

SELECT r.Person
, PERCENTILERANK(r.Score1) AS ' Score1Rank'
, PERCENTILERANK(r.Score2) AS ' Score2Rank'
, PERCENTILERANK(r.Score3) AS ' Score3Rank'
FROM Results r

From what I could find online, this functionality is built directly into Oracle. I think it's really a glaring omission from SQL Server.
0

Aaahhh, missed that, and then again, still cannot see that requirement above. But yes quite right (yet again - do you ever get tired), if there were several different columns, or in fact a want to nominate which "measure" should be used to rank then it would add an extra layer of complexity... Dynamic SQL would be handy,. but alas not in a function or view - a stored procedure however could go part way (if not all the way).

Cheers,
mark
0

Hi Dicklestein, small timing problem, I was typing the same time you were, my comments are as a response to AngelIII's post.

The functionality is partially available, and instead of opening up the function so that some of the facilities for stored procedures are available, Microsoft has decided that the ability to incorporate CLR is far more powerful. And they are correct - if you are a programmer, with all the tools.

There is nothing to stop you from having multiple statements as you have already observed (though have not checked) :

with data
as
( select t.*, dense_rank() over (order by score_1 asc) dr1,
dense_rank() over (order by score_2 asc) dr2,
dense_rank() over (order by score_3 asc) dr3
from ee_Scores t
where person not like 'R%'
)
select data.*
, cast(cast(dr1 as decimal(10,2)) / ( select max(dr1) from data ) as decimal(10,2)) percentile_rank_1
, cast(cast(dr2 as decimal(10,2)) / ( select max(dr2) from data ) as decimal(10,2)) percentile_rank_2
, cast(cast(dr3 as decimal(10,2)) / ( select max(dr3) from data ) as decimal(10,2)) percentile_rank_3
from data
;

However a function as you suggest would be considerably more elegant.

I will leave it to you guys to sort out - I am sure e-mails have already been exchanged. And thanks for taking the time to explain.

0

Author Commented:
As I read up more on this topic, it doesn't even sound that hard. See these links:

http://msdn.microsoft.com/en-us/library/ms131063(SQL.90).aspx

http://technet.microsoft.com/en-us/library/ms182741.aspx

http://searchsqlserver.techtarget.com/generic/0,295582,sid87_gci1251415,00.html

Perhaps I should cross post this on the .NET section of EE? Really, this seems like 2 minutes of work for a .NET wizard, and would be very nice for the SQL Server community to have.
0

Billing EngineerCommented:
If my comment implied somewhere that it was "hard" to do the clr.net function, I must apologize.
I justed wanted to make sure the border(s) you have to cross :)

and you are of course free to ask in the .net zone to get a .net developer to do it... now, although I do .net coding, I prefer, for the moment, to keep it out of my (prod) sql servers :)
0

Author Commented:
I realized that I could very easily use excel to dynamically generate the SQL Statements in this form:

with data1 as (select  w.*, dense_rank() over (order by t.Test1 asc) dr FROM  TestScores t) select  cast(cast(dr as decimal(10,3)) / ( select max(dr) from data1 ) as decimal(10,3)) Test1 from data1 order by 'Test1' desc

with data2 as (select  w.*, dense_rank() over (order by t.Test2 asc) dr FROM  TestScores t) select  cast(cast(dr as decimal(10,3)) / ( select max(dr) from data1 ) as decimal(10,3)) Test2 from data2 order by 'Test2' desc

with data3 as (select  w.*, dense_rank() over (order by t.Test3 asc) dr FROM  TestScores t) select  cast(cast(dr as decimal(10,3)) / ( select max(dr) from data1 ) as decimal(10,3)) Test3 from data3 order by 'Test3' desc

Each of these works fine on its own, but is there a way to "glue" all of these together using INNER JOINS or Unions or something, so we have all of the PercentRanks for each Test, and the leftmost column would be the name of the Person.

0

Yep, that is a good idea...

There is a lot of good info in the MS arena (as much as we all bag them out), it is pretty powerful stuff... Simply add EE and you have solutions (not just technology)!

Cheers,
0

I gotta get faster e-mail. My last comment (22622995) was referring to Dicklestein's .net comments...

In reference to ID:22622991 (ie "glue") then if you are going to do that, then you can do it in one query see 22622860 above.

0

Author Commented:
Yes, but my way has the advantage that the atomistic statements all fit on one line and thus can be trivially replicated with easy string manipulations in excel. See the attached file to see what I mean.
PercentileRank-SQL-Generate.xls
0

Crikey !!

When AngelIII first suggest multi-columnar difficulties, I thought - yeah 3... Not quite 56...

If you are really happy with Excell, then what about if... we simply create a procedure that can respond to your Excel ? Might be fun anyway, even if not answering your question (Told you you closed this question off far too early...). Possibly more an example that MS SQL might be at least flexibile if not as powerful as Oracle...

It is possible to open your excel spreadsheet as a linked server, attaching to that worksheet (as if it were a table) create a dynamic SQL statement from the big column, and then execute it....

Have it as a stored procedure, nominating which test you want to run, or all. Results can be appended to a table for subsequent review (or display as output).

-- create a link to that spreadsheet so we can import

'Jet 4.0',
'Microsoft.Jet.OLEDB.4.0',
'c:\ee\PercentileRank-SQL-Generate.xls',
NULL,
'Excel 5.0;'
GO

GO

--List the tables in the linked server (these are the worksheet names) - not needed, but interesting

EXEC sp_tables_ex MyExcel
GO

-- I did take the liberty of "fixing" the column names... SEQ, COMMENT, TEST, QUERY

select * from myexcel...percentrank$where TEST like 'TEST%' -- we can also write back / update that spreadsheet from here just like a table. -- now run through the blighter and execute each command... declare @sql varchar(2000) DECLARE c CURSOR FAST_FORWARD READ_ONLY FOR select query from myexcel...percentrank$ where TEST like 'TEST%'
OPEN c

FETCH NEXT FROM c INTO @sql
WHILE @@FETCH_STATUS = 0
BEGIN
print @sql  -- or insert
exec (@sql) -- really an executesql, just lazy
FETCH NEXT FROM c INTO @sql
END
CLOSE c
DEALLOCATE c

-- now clean it up...

GO

0

If you don't need to update, and just use as a datasource,

you could also do :

SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=c:\ee\PercentileRank-SQL-Generate.xls;HDR=YES', 'SELECT * FROM [percentrank$]') GO select * from OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0', 'data source="c:\ee\PercentileRank-SQL-Generate.xls";Extended properties=Excel 5.0;')...[percentrank$]
GO
0

Author Commented:
Mark,

I'm not sure I really follow how your code works, but will try to study it more. If I don't actually need to interface with excel, and can just copy and paste from excel to a SQL query pane, is there a nice way you know to glue together those queries as I suggested?

Also, I posted a question in the .NET zone about creating the percentile rank CLR User Defined Aggregate function we discussed:

http://www.experts-exchange.com/Programming/Languages/.NET/Visual_CSharp/Q_23782290.html
0

Author Commented:
Mark, never mind my last question. I figured it out myself using your approach.

I created the attached excel template which is very nice for generating this SQL in a couple minutes, even for dozens of quantities. I suppose having a collection of similar excel templates could really speed up development.
Automate-Percent-Rank-SQL-Query.xls
0