Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

How do I calculate quartiles in MSSQL

Posted on 2010-09-06
10
Medium Priority
?
4,238 Views
Last Modified: 2012-05-10
How do I calculate a quartile in mssql (1st quartile, 2nd quartile, etc..)?
I have attached an excel sheet with 57 rows and showing what the 1st ,2nd 3rd and 4th quartile should be
Assume I have those 57 rows in mssql table
How do I get the same number from an mssql query?
Ideally I would run a query against teh database table and the result will be a table with 4 rows
(1st,2nd,3rd, and 4th quartile with their values)
Thank you

Book1.xls
0
Comment
Question by:xav056
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
  • 2
10 Comments
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 33610235
I presume you want this:
with min_max as ( 
  select min(yourfield) n
       , max(yourfield) x
       , ( max(yourfield) - min(yourfield)  ) / 4 q 
       , min(yourfield) + 1 * ( ( max(yourfield) - min(yourfield)  ) / 4 ) q1
       , min(yourfield) + 2 * ( ( max(yourfield) - min(yourfield)  ) / 4 ) q2
       , min(yourfield) + 3 * ( ( max(yourfield) - min(yourfield)  ) / 4 ) q3
from yourtable)
select min(case when yourfield >= q1 then yourfield) Q1
     , min(case when yourfield >= q2 then yourfield) Q2
     , min(case when yourfield >= q3 then yourfield) Q3
     , min(case when yourfield >= x then yourfield) Q4
  from yourtable

Open in new window

0
 
LVL 9

Author Comment

by:xav056
ID: 33610306
getting a syntax error
q1, q2 .. in the second block
I guess because q1 q2 are in min_max
I am not sue how to fix it
thanks
0
 
LVL 143

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 33610325
yes, indeed, sorry for that
with min_max as ( 
  select min(yourfield) n
       , max(yourfield) x
       , ( max(yourfield) - min(yourfield)  ) / 4 q 
       , min(yourfield) + 1 * ( ( max(yourfield) - min(yourfield)  ) / 4 ) q1
       , min(yourfield) + 2 * ( ( max(yourfield) - min(yourfield)  ) / 4 ) q2
       , min(yourfield) + 3 * ( ( max(yourfield) - min(yourfield)  ) / 4 ) q3
from yourtable)
select min(case when yourfield >= q1 then yourfield) Q1
     , min(case when yourfield >= q2 then yourfield) Q2
     , min(case when yourfield >= q3 then yourfield) Q3
     , min(case when yourfield >= x then yourfield) Q4
  from yourtable
  cross join min_max 

Open in new window

0
Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

 
LVL 58

Accepted Solution

by:
cyberkiwi earned 2000 total points
ID: 33610329
;with tmp as (
      select n, qt=NTILE(4) over (order by n), rn=ROW_NUMBER() over (order by n)
      from quartile_test)
select n
from tmp
where rn in (select MAX(rn) from tmp group by qt)


Quartile_test should be the name of your table, n is the column name.
0
 
LVL 9

Author Comment

by:xav056
ID: 33610349
I am getting
39.314516      59.208731        84.942085                          100.000000
instead of
39.9103139      53.6809816       92.28855721       100
0
 
LVL 58

Expert Comment

by:cyberkiwi
ID: 33610391
Were you referring to http:#a33610329?
With your data (Excel), it gives me

39.9103139013452
53.680981595092
92.2885572139303
100
0
 
LVL 9

Author Comment

by:xav056
ID: 33610497
cyberwiki:
I am getting slightly different numbers, for the rows below
17.32522796
17.97385621
25.45180723
31.39534884
32.86852590
32.95454545
36.97916667
37.84277879
39.31451613
40.92409241
41.59445407
49.30069930
50.59021922
52.03915171
53.68098160
57.50853242
59.20873124
59.49554896
60.35665295
64.09168081
70.95238095
84.94208494
92.28855721
92.28971963
92.45689655
93.47442681

I am getting
36.97916667                    52.03915171               64.09168081                 93.47442681
while excel is giving
37.1950697                         51.31468547                   63.15792385               93.47442681
the only extra thing I did was added a partition by colum2 before the order by for the ntile function, and the row number
as my original result set had 2 columns
where rows were seperated by groups of column 2

Any idea why that might be happening?
Thank you
0
 
LVL 58

Expert Comment

by:cyberkiwi
ID: 33610865
The reason is that Excel takes the quartiles at predetermined locations.
For your original 57-row query, the quartiles are at positions (57-1)/4 = 14 apart.  Or rather, the layout is

Ignoring 1, and adding 14 for each quartile, the positions are 15, 29, 43, 57

Using your new data with 26 elements, the quartiles should be (26-1)/4 = 6.25 apart, or positions

7.25, 13.5, 19.75, 26

There is no item at position 7.25, so instead that is calculated from (7th x 0.75 + 8th x 0.25)
Whether Excel's interpretation is correct or not is up for debate.
0
 
LVL 9

Author Comment

by:xav056
ID: 33612280
Although its still not crystal clear on how excel is calculating the position, is there an easy way to get exact results I am getting in excel using sql
thank you
0
 
LVL 58

Expert Comment

by:cyberkiwi
ID: 33613648

;with tmp as (
	select c=COUNT(*) over (), rn=ROW_NUMBER() over (order by n), n
	from quartile_test)
,qs as (
	select
		q1=1+(c-1.0)/4, q1a=FLOOR(1+(c-1.0)/4), q1b=CEILING(1+(c-1.0)/4),
		q2=1+(c-1.0)/2, q2a=FLOOR(1+(c-1.0)/2), q2b=CEILING(1+(c-1.0)/2),
		q3=1+(c-1.0)*3/4, q3a=FLOOR(1+(c-1.0)*3/4), q3b=CEILING(1+(c-1.0)*3/4)
	from (select top 1 c from tmp) x)
select (1-q1+q1a)*MIN(n)+(q1-q1a)*MAX(n)
from tmp,qs where rn in (q1a,q1b) group by qs.q1,qs.q1a,qs.q1b
union all
select (1-q2+q2a)*MIN(n)+(q2-q2a)*MAX(n)
from tmp,qs where rn in (q2a,q2b) group by qs.q2,qs.q2a,qs.q2b
union all
select (1-q3+q3a)*MIN(n)+(q3-q3a)*MAX(n)
from tmp,qs where rn in (q3a,q3b) group by qs.q3,qs.q3a,qs.q3b
union all
select n
from tmp,qs where rn=c

Open in new window

1

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction: When running hybrid database environments, you often need to query some data from a remote db of any type, while being connected to your MS SQL Server database. Problems start when you try to combine that with some "user input" pass…
In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question