Solved

Best Implementation of Excel's PERCENTRANK function in SQL Server 2008

Posted on 2008-09-30
23
3,765 Views
Last Modified: 2012-05-05
Hi,

I've done a bunch of web searching on this question, and among others, found the solution below. I want to covert the approach described below into a more versatile UDF or stored procedure for calculating percentile ranks in lots of different tables and views in a database quickly and easily:

http://www.sqlteam.com/article/computing-percentiles-in-sql-server

Basically, in this approach you define a UDF for linear interpolation called LERP; syntax in this application would be LERP(given_value,min_value,max_value,output_min,output_high).

The function is defined as:
_______________________________________

declare @pp float
set @pp = .5

declare @k int, @d float, @ax float, @bx float
declare @values table (i int identity(1,1), x float)

insert @values select Score from @TestScores order by Score

select @k=floor(kf), @d=kf-floor(kf)
from (select 1+@pp*(count(*)-1) as kf from @values) as x1

select @ax=x from @values where i=@k
select @bx=x from @values where i=@k+1

select @pp as factor,
   dbo.LERP(@d, 0.0, 1.0, @ax, @bx) as percentile}
_______________________________________

They then apply the function over this generated data set:
_______________________________________

Declare @TestScores table (StudentID int identity(1,1), Score int)
insert @TestScores (Score) Values (20)
insert @TestScores (Score) Values (03)
insert @TestScores (Score) Values (40)
insert @TestScores (Score) Values (45)
insert @TestScores (Score) Values (50)
insert @TestScores (Score) Values (20)
insert @TestScores (Score) Values (90)
insert @TestScores (Score) Values (20)
insert @TestScores (Score) Values (11)
insert @TestScores (Score) Values (30)
while @@ROWCOUNT > 0
insert @TestScores select t.Score from @TestScores t
cross join
(select max (StudentID) MaxRowNum from @TestScores) x
where
t.StudentID <= 100 - x.MaxRowNum
_______________________________________

In my application, I have a large list of people and scores. I basically want what I have in excel-- a relatively easy way to apply this function automatically (perhaps using dynamic SQL?) to a given SELECT statement. For example, suppose I had this data in a table called Results:

Person      Score
Bill                 45
Fred                 112
Jim                 23
Rick                 200
Bob                 76
Andy            23
Roger         39

Ideally I would like to be able to do something like the following:

SELECT r.Person, r.Score, PERCENTILERANK(r.Score) AS 'Percentile Rank'
FROM Results r
WHERE r.Person not like 'R%';

This would produce:

Person           Score            Percent
Bill                    45            0.500
Fred                      112            1.000
Jim                      23            0.000
Bob                      76            0.750
Andy                      23            0.000

As I said, this would be ideal. If this is simply not technically feasible in SQL Server 2008, then I think the next best thing would be a stored procedure that could be very quickly modified each time to compute the percentile ranking for a given table, perhaps populating a new column using INTO or something. I have read just about everything on the web about this, and I can't help but think there is a better way out there for this incredibly common and useful task. I keep thinking there is a way to use NTILE and PARTITION to do what I want, but I can't figure it out. I am bewildered why this functionality is not embedded in T-SQL in 2008.

I suspect there is also a clever way to do this using a data cube in Analysis Services, but Analysis Services is a big complicated piece of software and I'd rather stick to vanilla T-SQL.

I leave it to the wizards out there to show the way. Thanks for your help!




 

0
Comment
Question by:Dicklestein
  • 9
  • 9
  • 5
23 Comments
 
LVL 142

Accepted Solution

by:
Guy Hengel [angelIII / a3] earned 300 total points
Comment Utility
please check this out:

declare @t table (person varchar(10), score int )
set nocount on 
insert into @t values('Bill         ',        45)
insert into @t values('Fred          ',       112)
insert into @t values('Jim          ',       23)
insert into @t values('Rick          ',       200)
insert into @t values('Bob         ',        76)
insert into @t values('Andy      ',      23)
insert into @t values('Roger     ',    39) 
;with data as (
select t.*
, dense_rank() over (order by score asc) dr
from @t t
where person not like 'R%'
)
select data.*
     , cast(cast(dr as decimal(10,2)) / ( select max(dr) from data ) as decimal(10,2)) percentile_rank
  from data

Open in new window

0
 

Author Comment

by:Dicklestein
Comment Utility
AngelIII,

Thanks very much for your prompt reply. Your solution is very nice and fast. But is there any way to wrap up the whole thing into a UDF that I can then use APPLY on? What I'm really looking for is a more modularized solution that is easy to re-use, and which can be easily put into a view. Thanks
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
Comment Utility
eventually it would be possible with a .Net CLR function...
however, performance being a key factor, I prefer to rewrite/copy/paste some nice sql code over having to fiddle with .net code (although I like writing .net code, actually) :)
0
 

Author Comment

by:Dicklestein
Comment Utility
OK, that makes sense. If you don't mind (not sure what the protocol is for this-- please correct me if I'm wrong), I'd like to keep the question open in hopes that someone will post a such a CLR function or other means of accomplishing my goal.

Also, AngelIII, I'm interested in talking to you about some hourly work, and could not locate your email address. You can reach me at this one-time use email:
<email removed to avoid spamming>

Thanks, and I'll share any points with you and anyone who contributes more.
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
Comment Utility
>Also, AngelIII, I'm interested in talking to you about some hourly work, and could not locate your email address. You can reach me at this one-time use email:

please check my profile, it is listed there.
angel eyes
0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 200 total points
Comment Utility
The dense_rank is the best way as AngelIII has already shown (is he ever wrong ?),

but not sure why you want a CLR or even a UDF. Simply use a view :

create view vw_percentiles as
with data
as
( select t.*, dense_rank() over (order by score asc) dr
     from ee_Scores t
     where person not like 'R%'
)
select data.*
     , cast(cast(dr as decimal(10,2)) / ( select max(dr) from data ) as decimal(10,2)) percentile_rank
  from data
;


then you can do a:

select * from  mytable inner join vw_percentiles on mytable.person = vw_percentiles.person

or is there someother reason you want to use the cross apply ?

in which case could then use a function, but really depends on "why a function" and "why not a view"...

create function uGetPercentiles()
returns @tbl table (person varchar(10), score int, dr int, percentile decimal(10,5))
as
begin

insert @tbl(person,score,dr,percentile)
select * from vw_percentiles

return

end


All crisp clean SQL, no CLR, possibly no SP, just a humble (and often overlooked) view...
0
 

Author Comment

by:Dicklestein
Comment Utility
It's really more of a matter of ease of use. I'd like to be able to just call it like you would an excel function, but it appears that SQL Server just doesn't work like that that. With the solution you guys give, you would have to re-type all of that stuff every time you ever want to calculate a percentage rank (which is for me constantly).

Anyway, since it can't be done easily the way I'm talking about, I will close the question and split the points. In fairness, AngelIII answered first with substantially the same response as mark, so he will get more of the points.

Thanks again for the help.
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Thanks for the points, and just a tip, if you are unhappy with the answer, or doesn't seem to be what you want, or feel a need to query us, then please let us know... Only too happy to spend time to resolve any hanging issues...

Not sure why it has to be retyped every time. Once it is part of a view (for example), it is dynamically available - the view is already part of the database, and cannot think of anything easier than select * from vw_percentiles - certainly do not have to re-create. Or, was there something else that you think might have to be regenerated ?

Even though you have awarded points, would not mind finishing off the discussion...
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
Comment Utility
mark,

 the "problem" is that if you have several columns, not only score, and you want to apply the same ranking method, you have to copy/paste the function....

  unless you want to implement an .net clr aggregation function :)
0
 

Author Comment

by:Dicklestein
Comment Utility
Well, I understand that you could easily select from a view of percentile ranks calculated for a given quantity (say, Score in my example above). That's just a matter of creating a view as you describe in your post.

That's not what I'm talking about-- I want to be able to apply the percent rank function to dozens of different quantities, and I don't want to have to create a separate view for each of them, or even to have one view with a bunch of identical SQL statements, each of which re-specifies the way to calculate a percent rank using the DENSE RANK function, etc. etc.

For example, if you go back to my original sample data, suppose that instead of one set of scores, we had 20 of them-- score1 is a score on a written part of a test, score2 is the oral part, score3 is for the extra credit section, etc. So every student has a set of 20 scores. I want an easy way to calculate, in one view, the percentile rank for each student for each set of scores. That's why I think a CLR function would be super useful-- you could just write something like:

SELECT r.Person
, PERCENTILERANK(r.Score1) AS ' Score1Rank'
, PERCENTILERANK(r.Score2) AS ' Score2Rank'
, PERCENTILERANK(r.Score3) AS ' Score3Rank'
FROM Results r

From what I could find online, this functionality is built directly into Oracle. I think it's really a glaring omission from SQL Server.
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Aaahhh, missed that, and then again, still cannot see that requirement above. But yes quite right (yet again - do you ever get tired), if there were several different columns, or in fact a want to nominate which "measure" should be used to rank then it would add an extra layer of complexity... Dynamic SQL would be handy,. but alas not in a function or view - a stored procedure however could go part way (if not all the way).

Cheers,
mark
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Hi Dicklestein, small timing problem, I was typing the same time you were, my comments are as a response to AngelIII's post.

The functionality is partially available, and instead of opening up the function so that some of the facilities for stored procedures are available, Microsoft has decided that the ability to incorporate CLR is far more powerful. And they are correct - if you are a programmer, with all the tools.

There is nothing to stop you from having multiple statements as you have already observed (though have not checked) :

with data
as
( select t.*, dense_rank() over (order by score_1 asc) dr1,
                   dense_rank() over (order by score_2 asc) dr2,
                   dense_rank() over (order by score_3 asc) dr3
     from ee_Scores t
     where person not like 'R%'
)
select data.*
     , cast(cast(dr1 as decimal(10,2)) / ( select max(dr1) from data ) as decimal(10,2)) percentile_rank_1
     , cast(cast(dr2 as decimal(10,2)) / ( select max(dr2) from data ) as decimal(10,2)) percentile_rank_2
     , cast(cast(dr3 as decimal(10,2)) / ( select max(dr3) from data ) as decimal(10,2)) percentile_rank_3
  from data
;

However a function as you suggest would be considerably more elegant.

I will leave it to you guys to sort out - I am sure e-mails have already been exchanged. And thanks for taking the time to explain.

0
 

Author Comment

by:Dicklestein
Comment Utility
As I read up more on this topic, it doesn't even sound that hard. See these links:

http://msdn.microsoft.com/en-us/library/ms131063(SQL.90).aspx

http://technet.microsoft.com/en-us/library/ms182741.aspx

http://searchsqlserver.techtarget.com/generic/0,295582,sid87_gci1251415,00.html

Perhaps I should cross post this on the .NET section of EE? Really, this seems like 2 minutes of work for a .NET wizard, and would be very nice for the SQL Server community to have.
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
Comment Utility
If my comment implied somewhere that it was "hard" to do the clr.net function, I must apologize.
I justed wanted to make sure the border(s) you have to cross :)

and you are of course free to ask in the .net zone to get a .net developer to do it... now, although I do .net coding, I prefer, for the moment, to keep it out of my (prod) sql servers :)
0
 

Author Comment

by:Dicklestein
Comment Utility
I realized that I could very easily use excel to dynamically generate the SQL Statements in this form:

with data1 as (select  w.*, dense_rank() over (order by t.Test1 asc) dr FROM  TestScores t) select  cast(cast(dr as decimal(10,3)) / ( select max(dr) from data1 ) as decimal(10,3)) Test1 from data1 order by 'Test1' desc

with data2 as (select  w.*, dense_rank() over (order by t.Test2 asc) dr FROM  TestScores t) select  cast(cast(dr as decimal(10,3)) / ( select max(dr) from data1 ) as decimal(10,3)) Test2 from data2 order by 'Test2' desc

with data3 as (select  w.*, dense_rank() over (order by t.Test3 asc) dr FROM  TestScores t) select  cast(cast(dr as decimal(10,3)) / ( select max(dr) from data1 ) as decimal(10,3)) Test3 from data3 order by 'Test3' desc

Each of these works fine on its own, but is there a way to "glue" all of these together using INNER JOINS or Unions or something, so we have all of the PercentRanks for each Test, and the leftmost column would be the name of the Person.

0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Yep, that is a good idea...

There is a lot of good info in the MS arena (as much as we all bag them out), it is pretty powerful stuff... Simply add EE and you have solutions (not just technology)!

Cheers,
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
I gotta get faster e-mail. My last comment (22622995) was referring to Dicklestein's .net comments...

In reference to ID:22622991 (ie "glue") then if you are going to do that, then you can do it in one query see 22622860 above.

0
 

Author Comment

by:Dicklestein
Comment Utility
Yes, but my way has the advantage that the atomistic statements all fit on one line and thus can be trivially replicated with easy string manipulations in excel. See the attached file to see what I mean.
PercentileRank-SQL-Generate.xls
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Crikey !!

When AngelIII first suggest multi-columnar difficulties, I thought - yeah 3... Not quite 56...

If you are really happy with Excell, then what about if... we simply create a procedure that can respond to your Excel ? Might be fun anyway, even if not answering your question (Told you you closed this question off far too early...). Possibly more an example that MS SQL might be at least flexibile if not as powerful as Oracle...

It is possible to open your excel spreadsheet as a linked server, attaching to that worksheet (as if it were a table) create a dynamic SQL statement from the big column, and then execute it....

Have it as a stored procedure, nominating which test you want to run, or all. Results can be appended to a table for subsequent review (or display as output).




-- create a link to that spreadsheet so we can import
 

EXEC sp_addlinkedserver MyExcel,

     'Jet 4.0',

     'Microsoft.Jet.OLEDB.4.0',

     'c:\ee\PercentileRank-SQL-Generate.xls', 

      NULL,

     'Excel 5.0;'

GO
 

--Set up login mappings (just ADMIN - jet wants something).
 

EXEC sp_addlinkedsrvlogin MyExcel, FALSE, NULL, Admin, NULL

GO
 

--List the tables in the linked server (these are the worksheet names) - not needed, but interesting
 

EXEC sp_tables_ex MyExcel

GO
 

-- I did take the liberty of "fixing" the column names... SEQ, COMMENT, TEST, QUERY
 

select * from myexcel...percentrank$ where TEST like 'TEST%'
 

-- we can also write back / update that spreadsheet from here just like a table.
 
 

-- now run through the blighter and execute each command...
 

declare @sql varchar(2000)

 

DECLARE c CURSOR FAST_FORWARD READ_ONLY FOR select query from myexcel...percentrank$ where TEST like 'TEST%'

OPEN c
 

FETCH NEXT FROM c INTO @sql

WHILE @@FETCH_STATUS = 0

BEGIN

      print @sql  -- or insert

      exec (@sql) -- really an executesql, just lazy

      FETCH NEXT FROM c INTO @sql

END

CLOSE c

DEALLOCATE c
 

-- now clean it up...
 

sp_dropserver 'MyExcel', 'droplogins';

GO

Open in new window

0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
If you don't need to update, and just use as a datasource,

you could also do :

SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Excel 8.0;Database=c:\ee\PercentileRank-SQL-Generate.xls;HDR=YES', 'SELECT * FROM [percentrank$]')
GO

select * from OPENDATASOURCE('Microsoft.Jet.OLEDB.4.0', 'data source="c:\ee\PercentileRank-SQL-Generate.xls";Extended properties=Excel 5.0;')...[percentrank$]
GO
0
 

Author Comment

by:Dicklestein
Comment Utility
Mark,

I'm not sure I really follow how your code works, but will try to study it more. If I don't actually need to interface with excel, and can just copy and paste from excel to a SQL query pane, is there a nice way you know to glue together those queries as I suggested?

Also, I posted a question in the .NET zone about creating the percentile rank CLR User Defined Aggregate function we discussed:

http://www.experts-exchange.com/Programming/Languages/.NET/Visual_CSharp/Q_23782290.html
0
 

Author Comment

by:Dicklestein
Comment Utility
Mark, never mind my last question. I figured it out myself using your approach.

I created the attached excel template which is very nice for generating this SQL in a couple minutes, even for dozens of quantities. I suppose having a collection of similar excel templates could really speed up development.
Automate-Percent-Rank-SQL-Query.xls
0
 
LVL 51

Expert Comment

by:Mark Wills
Comment Utility
Looks good....
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

by Mark Wills Attending one of Rob Farley's seminars the other day, I heard the phrase "The Accidental DBA" and fell in love with it. It got me thinking about the plight of the newcomer to SQL Server...  So if you are the accidental DBA, or, simp…
In this article we will get to know that how can we recover deleted data if it happens accidently. We really can recover deleted rows if we know the time when data is deleted by using the transaction log.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now