Solved

Query to Exclude Outliers from Selected Rows

Posted on 2007-03-21
6
1,129 Views
Last Modified: 2012-05-05
I want to exclude outliers (the highest value and the lowest value) from a query of numerical data.  For example, if the table is like this:

ID   Student    Score
1     Doe           500
2     Doe            5
3     Doe           99
4     Doe           100
5     Doe           95
6     Doe           98
7     Doe           100
8     Smith         89
9     Smith         95
10   Smith         92
11   etc.

it is clear that Doe generally earned about 98%, but once had a strange 500% and a strange 5% (I want the query to exclude or ignore the 500 and and the 5).  
    1. Considering that other students besides Doe are in the table, what query could select only the rows where the student is Doe, while excluding Doe's outliers (the row with the highest value *and* the row with the lowest value)?  
   2.  What if I want to exclude only the row with the highest value?
0
Comment
Question by:Randall-B
  • 4
  • 2
6 Comments
 
LVL 5

Accepted Solution

by:
hkamal earned 250 total points
ID: 18765006
Let's call your table "myTable". You could write the query like this:

SELECT ID, Student, Score
FROM   myTable t1
WHERE  t1.Score != (SELECT MAX(Score) FROM myTable t2 WHERE t2.ID=t1.ID)
AND      t1.Score != (SELECT MIN(Score) FROM myTable t2 WHERE t2.ID=t1.ID)
AND      t1.Student =  "Doe"
To exclude outliers for all students, simple remove the last line
To remove only the top outlier, remove the second "AND" line
Remember that you are only ever excluding the top and bottom scores. If you had a 500% and a 498% you would still see the 498%. You may choose to alter the logic to eliminate, say, anything over 100% ..

Hope this helps
0
 

Author Comment

by:Randall-B
ID: 18765392
hkamal,
   Thanks.  My table is named "scores".  In PHP, I get the full result set when using a simple query like this:

    $sql = 'SELECT ID, Student, Score FROM scores WHERE Student = "Doe" ORDER BY Score ASC';
   $res = mysql_query($sql);

but I'm not getting anything (no results at all) when using the exclude-outliers query like this:

   $sql = 'SELECT ID, Student, Score FROM scores t1 WHERE  t1.Score != (SELECT MAX(Score) FROM scores t2 WHERE t2.ID=t1.ID) AND t1.Score != (SELECT MIN(Score) FROM scores t2 WHERE t2.ID=t1.ID) AND t1.Student = "Doe" ORDER BY Score ASC';
    $res = mysql_query($sql);

What could be wrong?
0
 

Author Comment

by:Randall-B
ID: 18765592
Should it be like this?

$sql = 'SELECT ID, Student, Score FROM scores t1 WHERE t1.Student = "Doe" AND t1.Score != (SELECT MAX(Score) FROM scores t2 WHERE t2.Student="Doe") AND t1.Score != (SELECT MIN(Score) FROM scores t2 WHERE t2.Student="Doe") ORDER BY Score ASC';
0
Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

 

Author Comment

by:Randall-B
ID: 18765933
Also, please see my related question at http:/Q_22463806.html , where I'm asking how to exclude the *two* highest outliers (without any arbitrary cutoffs like "100%"). Thanks.
0
 

Author Comment

by:Randall-B
ID: 18767078
I'm accepting your answer with the understanding that the query should be:

$sql = 'SELECT ID, Student, Score FROM scores t1 WHERE t1.Student = "Doe" AND t1.Score != (SELECT MAX(Score) FROM scores t2 WHERE t2.Student="Doe") AND t1.Score != (SELECT MIN(Score) FROM scores t2 WHERE t2.Student="Doe") ORDER BY Score ASC';

Thanks.
0
 
LVL 5

Expert Comment

by:hkamal
ID: 18770144
Hi Randall-B,
I made the wrong assumption that ID was unique per Student, but I can see it is not. In which case, you can use a modified version of my query, viz:
SELECT ID, Student, Score
FROM   myTable t1
WHERE  t1.Score != (SELECT MAX(Score) FROM myTable t2 WHERE t2.Student=t1.Student)
AND      t1.Score != (SELECT MIN(Score) FROM myTable t2 WHERE t2.Student=t1.Student)
AND      t1.Student =  "Doe"
This was, all mty above statements still apply and you can still use the query for any student as long as you simply change the last line
I'll take a look at your other query
0

Featured Post

Free Trending Threat Insights Every Day

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Fore-Foreword Today (2016) Maxmind has a new approach to the distribution of its data sets.  This article may be obsolete.  Instead of using the examples here, have a look at the MaxMind API (https://www.maxmind.com/en/geolite2-developer-package). …
Introduction In this installment of my SQL tidbits, I will be looking at parsing Extensible Markup Language (XML) directly passed as string parameters to MySQL 5.1.5 or higher. These would be instances where LOAD_FILE (http://dev.mysql.com/doc/refm…
This video discusses moving either the default database or any database to a new volume.
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

24 Experts available now in Live!

Get 1:1 Help Now