xav056

asked on

# How do I calculate quartiles in MSSQL

How do I calculate a quartile in mssql (1st quartile, 2nd quartile, etc..)?

I have attached an excel sheet with 57 rows and showing what the 1st ,2nd 3rd and 4th quartile should be

Assume I have those 57 rows in mssql table

How do I get the same number from an mssql query?

Ideally I would run a query against teh database table and the result will be a table with 4 rows

(1st,2nd,3rd, and 4th quartile with their values)

Thank you

Book1.xls

I have attached an excel sheet with 57 rows and showing what the 1st ,2nd 3rd and 4th quartile should be

Assume I have those 57 rows in mssql table

How do I get the same number from an mssql query?

Ideally I would run a query against teh database table and the result will be a table with 4 rows

(1st,2nd,3rd, and 4th quartile with their values)

Thank you

Book1.xls

ASKER

getting a syntax error

q1, q2 .. in the second block

I guess because q1 q2 are in min_max

I am not sue how to fix it

thanks

q1, q2 .. in the second block

I guess because q1 q2 are in min_max

I am not sue how to fix it

thanks

yes, indeed, sorry for that

```
with min_max as (
select min(yourfield) n
, max(yourfield) x
, ( max(yourfield) - min(yourfield) ) / 4 q
, min(yourfield) + 1 * ( ( max(yourfield) - min(yourfield) ) / 4 ) q1
, min(yourfield) + 2 * ( ( max(yourfield) - min(yourfield) ) / 4 ) q2
, min(yourfield) + 3 * ( ( max(yourfield) - min(yourfield) ) / 4 ) q3
from yourtable)
select min(case when yourfield >= q1 then yourfield) Q1
, min(case when yourfield >= q2 then yourfield) Q2
, min(case when yourfield >= q3 then yourfield) Q3
, min(case when yourfield >= x then yourfield) Q4
from yourtable
cross join min_max
```

ASKER CERTIFIED SOLUTION

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

ASKER

I am getting

39.314516 59.208731 84.942085 100.000000

instead of

39.9103139 53.6809816 92.28855721 100

39.314516 59.208731 84.942085 100.000000

instead of

39.9103139 53.6809816 92.28855721 100

Were you referring to http:#a33610329?

With your data (Excel), it gives me

39.9103139013452

53.680981595092

92.2885572139303

100

With your data (Excel), it gives me

39.9103139013452

53.680981595092

92.2885572139303

100

ASKER

cyberwiki:

I am getting slightly different numbers, for the rows below

17.32522796

17.97385621

25.45180723

31.39534884

32.86852590

32.95454545

36.97916667

37.84277879

39.31451613

40.92409241

41.59445407

49.30069930

50.59021922

52.03915171

53.68098160

57.50853242

59.20873124

59.49554896

60.35665295

64.09168081

70.95238095

84.94208494

92.28855721

92.28971963

92.45689655

93.47442681

I am getting

36.97916667 52.03915171 64.09168081 93.47442681

while excel is giving

37.1950697 51.31468547 63.15792385 93.47442681

the only extra thing I did was added a partition by colum2 before the order by for the ntile function, and the row number

as my original result set had 2 columns

where rows were seperated by groups of column 2

Any idea why that might be happening?

Thank you

I am getting slightly different numbers, for the rows below

17.32522796

17.97385621

25.45180723

31.39534884

32.86852590

32.95454545

36.97916667

37.84277879

39.31451613

40.92409241

41.59445407

49.30069930

50.59021922

52.03915171

53.68098160

57.50853242

59.20873124

59.49554896

60.35665295

64.09168081

70.95238095

84.94208494

92.28855721

92.28971963

92.45689655

93.47442681

I am getting

36.97916667 52.03915171 64.09168081 93.47442681

while excel is giving

37.1950697 51.31468547 63.15792385 93.47442681

the only extra thing I did was added a partition by colum2 before the order by for the ntile function, and the row number

as my original result set had 2 columns

where rows were seperated by groups of column 2

Any idea why that might be happening?

Thank you

The reason is that Excel takes the quartiles at predetermined locations.

For your original 57-row query, the quartiles are at positions (57-1)/4 = 14 apart. Or rather, the layout is

Ignoring 1, and adding 14 for each quartile, the positions are 15, 29, 43, 57

Using your new data with 26 elements, the quartiles should be (26-1)/4 = 6.25 apart, or positions

7.25, 13.5, 19.75, 26

There is no item at position 7.25, so instead that is calculated from (7th x 0.75 + 8th x 0.25)

Whether Excel's interpretation is correct or not is up for debate.

For your original 57-row query, the quartiles are at positions (57-1)/4 = 14 apart. Or rather, the layout is

Ignoring 1, and adding 14 for each quartile, the positions are 15, 29, 43, 57

Using your new data with 26 elements, the quartiles should be (26-1)/4 = 6.25 apart, or positions

7.25, 13.5, 19.75, 26

There is no item at position 7.25, so instead that is calculated from (7th x 0.75 + 8th x 0.25)

Whether Excel's interpretation is correct or not is up for debate.

ASKER

Although its still not crystal clear on how excel is calculating the position, is there an easy way to get exact results I am getting in excel using sql

thank you

thank you

```
;with tmp as (
select c=COUNT(*) over (), rn=ROW_NUMBER() over (order by n), n
from quartile_test)
,qs as (
select
q1=1+(c-1.0)/4, q1a=FLOOR(1+(c-1.0)/4), q1b=CEILING(1+(c-1.0)/4),
q2=1+(c-1.0)/2, q2a=FLOOR(1+(c-1.0)/2), q2b=CEILING(1+(c-1.0)/2),
q3=1+(c-1.0)*3/4, q3a=FLOOR(1+(c-1.0)*3/4), q3b=CEILING(1+(c-1.0)*3/4)
from (select top 1 c from tmp) x)
select (1-q1+q1a)*MIN(n)+(q1-q1a)*MAX(n)
from tmp,qs where rn in (q1a,q1b) group by qs.q1,qs.q1a,qs.q1b
union all
select (1-q2+q2a)*MIN(n)+(q2-q2a)*MAX(n)
from tmp,qs where rn in (q2a,q2b) group by qs.q2,qs.q2a,qs.q2b
union all
select (1-q3+q3a)*MIN(n)+(q3-q3a)*MAX(n)
from tmp,qs where rn in (q3a,q3b) group by qs.q3,qs.q3a,qs.q3b
union all
select n
from tmp,qs where rn=c
```

Open in new window