Solved

Join on Median, Quartile and Average queries

Posted on 2006-11-14
9
267 Views
Last Modified: 2012-08-13
I would like to combine my query below so that the results look like this:
Quality               Average Score         Quartile 1                  Quartile 3         Median
Promptness         10                          9                                  8                 9.5



SELECT Quality, [Average Score]
FROM
(
select  '1. Promptness'  as Quality, AVG(CASE WHEN prac_promptness NOT IN ('N/A', 'NA') THEN Convert(integer,prac_promptness) ELSE NULL END) as [Average Score]
from dbo.TBL_EDTL_PRACEVAL where prac_term = 'Fall Semester 2006'

) sub
WHERE NOT [Average Score] IS NULL
union
select '1. Promptness'  as Quality, sum(promptness)/count(*) as [Quartile 1]
from(
select top 25 Convert(integer,prac_promptness)as promptness
from dbo.TBL_EDTL_PRACEVAL
where prac_term = 'Fall Semester 2006' and prac_promptness not in ( 'N/A','NA')
order by Convert(integer,prac_promptness)
) as l

union

select '1. Promptness'  as Quality, sum(promptness)/count(*) as [Quartile3]
from(
select top 25 percent Convert(integer,prac_promptness)as promptness
from dbo.TBL_EDTL_PRACEVAL
where prac_term = 'Fall Semester 2006' and prac_promptness not in ( 'N/A','NA')
order by Convert(integer,prac_promptness)
) tt

union

declare @pos1 int <----- error messsage says Incorrect syntax near the keyword 'declare'
declare @pos2 int

declare @oneA float
declare @twoA float

select  @pos1 = (count(*)+1)/2, @pos2 = count(*)/2 + 1 from dbo.TBL_EDTL_PRACEVAL where prac_promptness not in ( 'N/A','NA')and IsNumeric(prac_comm_skills)=1
set rowcount @pos1
select @oneA = prac_promptness from dbo.TBL_EDTL_PRACEVAL where IsNumeric( prac_promptness)=1


set rowcount @pos2
select @twoA = prac_promptness from dbo.TBL_EDTL_PRACEVAL where IsNumeric( prac_promptness)=1
group by prac_promptness
set rowcount 0

select (@oneA + @twoA)/2.0 as Median

please help  thnx

0
Comment
Question by:friskee
  • 5
  • 4
9 Comments
 
LVL 32

Expert Comment

by:bhess1
ID: 17942367
First - declare any vars before starting the SELECT statements.  Moving those to the top will correct the error you are receiving.

Second - there is no reason to include 1. Promptness as a constant in all of these queries.  

Third - once you start a SELECT statement, you can only include parts of one continuous statement in the SELECT.  The MEDIAN calculation would need to be performed prior to the SELECT statement designed to return your data.

Here is one way to do this entire thing in one SELECT (including the MEDIAN call).  I believe that Quartile1 and Quartile3 need definition.  For example, if Quartile1 would be the high scores, then you need to add a DESC qualifier to the ORDER BY clause.  Similarly, if the Quartile3 value would be from the first set lower than the median, you would need to adjust the DESC qualifier so that it is on the outer query, not the inner query.

There may be some syntax errors, but I don't identify any when testing here.

SELECT '1. Promptness' AS Quality,
      (
      SELECT AVG(
            CASE
                  WHEN prac_promptness NOT IN ('N/A', 'NA')
                  THEN Convert(integer,prac_promptness)
                  ELSE NULL
            END
            )
      FROM dbo.TBL_EDTL_PRACEVAL
      WHERE prac_term = 'Fall Semester 2006'
      ) [Average Score],

      (
      SELECT SUM(promptness) / COUNT(promptness)
      FROM (
            SELECT TOP 25 PERCENT
                  Convert(integer, prac_promptness )as promptness
            FROM dbo.TBL_EDTL_PRACEVAL
            WHERE prac_term = 'Fall Semester 2006'
                  AND prac_promptness not in ( 'N/A','NA')
            ORDER BY Convert(integer, prac_promptness)
            ) AS X
      ) AS Quartile1,

      (
      SELECT SUM(promptness)/COUNT(promptness)
      FROM (
            SELECT TOP 25 PERCENT promptness
            FROM (
                  SELECT TOP 50 PERCENT Convert(integer, prac_promptness )as promptness
                  FROM dbo.TBL_EDTL_PRACEVAL
                  WHERE prac_term = 'Fall Semester 2006'
                        AND prac_promptness not in ( 'N/A','NA')
                  ORDER by Convert(integer, prac_promptness) DESC
                  ) Src
            ORDER BY promptness
            ) tt
      ) AS Quartile3,

      (SELECT
            (SELECT TOP 1 Convert(integer, prac_promptness)
            FROM (
                  SELECT TOP 50 PERCENT prac_promptness
                  FROM dbo.TBL_EDTL_PRACEVAL
                  WHERE prac_promptness NOT IN ( 'N/A','NA')
                  ORDER BY Convert(integer, prac_promptness)
                  ) s1
            ) +
            (SELECT TOP 1 Convert(integer, prac_promptness)
            FROM (
                  SELECT TOP 50 PERCENT prac_promptness
                  FROM dbo.TBL_EDTL_PRACEVAL
                  WHERE prac_promptness NOT IN ( 'N/A','NA')
                  ORDER BY Convert(integer, prac_promptness) DESC
                  ) s2
            )
       / 2.0) AS Media
0
 

Author Comment

by:friskee
ID: 17956140
The suggested solution thast is posted is giving me the correct results for the average, quartile 1 and 3 but for the median, the answer is not correct.  I am trying to use the Group by clause in the last select statement for the median as I've done in the original code above, but am running into an error message about line 1 not being contained in an aggregrate function or grouped by clause  
0
 
LVL 32

Expert Comment

by:bhess1
ID: 17959833
What median value do you expect?  What do you get from the above query?

Actually, I do not understand why the grouping occurs in your query.  Median has a simple definition - the middle value in a list (interpolated when there are an even number of lines in the list).  So, for this list (14 items):

1
1
1
2
2
3
4
5
6
6
7
7
8
9

The Median is (4+5)/2 = 4.5

If we add an additional '1' to the top of the list, the median is 4.  If we add a '9' at the bottom, the median is 5.

BUT:  if we use your procedure, we end up with a different result:

@Pos1 will be 7
@Pos2 will be 8

@oneA will be 4

However, @twoA will be selected from a grouped list:

1
2
3
4
5
6
7
8
9

This will return an 8, instead of a 5, resulting in a median value of (4+8)/2 = 6, not the 4.5 it should have returned.  I recommend verifying your median routine on a simplified set of data.
0
 

Author Comment

by:friskee
ID: 18010021
This is what is in the column:

prac_promptness
12
12
12
7
7
5
N/A

so based on the logic above, there should be a median of 9.5 since there are six values, then we take two values in the middle, add them and devide by 2-
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 

Author Comment

by:friskee
ID: 18020089
With the suggestion submitted above by bhess1, however, I get a median of 11.000 for the values in this column
0
 
LVL 32

Expert Comment

by:bhess1
ID: 18021052
Interesting.  Looking it over, it appears that I placed the division by 2 in the wrong place.  This code fragment provides the correct value for me from your data:

      0.5 * (
            (SELECT TOP 1 Convert(integer, prac_promptness)
            FROM (
                  SELECT TOP 50 PERCENT prac_promptness
                  FROM dbo.TBL_EDTL_PRACEVAL
                  WHERE prac_promptness NOT IN ( 'N/A','NA')
                  ORDER BY Convert(integer, prac_promptness)
                  ) s1
            ) +
            (SELECT TOP 1 Convert(integer, prac_promptness)
            FROM (
                  SELECT TOP 50 PERCENT prac_promptness
                  FROM dbo.TBL_EDTL_PRACEVAL
                  WHERE prac_promptness NOT IN ( 'N/A','NA')
                  ORDER BY Convert(integer, prac_promptness) DESC
                  ) s2
            )
      ) AS Media

Note that I chose to multiply by 0.5 instead of dividing by 2.0 for two reasons:
    1)  Multiplication, even of fractional values, is faster than dividing.
    2)  Multiplying by 0.5 returns a result like 8.5, whereas dividing by 2.0 returns something like 8.500000

0
 

Author Comment

by:friskee
ID: 18021727
Using the online statistical calculator, I get a median of 9.5 for the values entered above, but SQL may be considering the NA value a little differently in the calculation, would I be correct in assuming this?
0
 
LVL 32

Accepted Solution

by:
bhess1 earned 200 total points
ID: 18022215
ARRGH!  Can you tell that Monday is not my best day?  Please use the code below instead.

      0.5 *(
            (SELECT TOP 1 Convert(integer, prac_promptness)
            FROM (
                  SELECT TOP 100 PERCENT * FROM (
                        SELECT TOP 50 PERCENT prac_promptness
                    FROM dbo.Test1
                    WHERE prac_promptness NOT IN ( 'N/A','NA')
                    ORDER BY Convert(integer, prac_promptness)
                    ) Src
                  ORDER BY convert(integer, prac_promptness) DESC
                  ) S1
            ) +
            (SELECT TOP 1 Convert(integer, prac_promptness)
            FROM (
                  SELECT TOP 50 PERCENT prac_promptness
                  FROM dbo.Test1
                  WHERE prac_promptness NOT IN ( 'N/A','NA')
                  ORDER BY Convert(integer, prac_promptness) DESC
                  ) s2
            )
      ) AS Media
0
 

Author Comment

by:friskee
ID: 18023578
Thank you bhess1, I really like your thoroughness in explaining things you've been a tremendous help.
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Suggested Solutions

For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Ever wondered why sometimes your SQL Server is slow or unresponsive with connections spiking up but by the time you go in, all is well? The following article will show you how to install and configure a SQL job that will send you email alerts includ…
Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
Using examples as well as descriptions, and references to Books Online, show the documentation available for date manipulation functions and by using a select few of these functions, show how date based data can be manipulated with these functions.

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

24 Experts available now in Live!

Get 1:1 Help Now