Solved

Query to Exclude Outliers from Selected Rows

Posted on 2007-03-21
6
1,148 Views
Last Modified: 2012-05-05
I want to exclude outliers (the highest value and the lowest value) from a query of numerical data.  For example, if the table is like this:

ID   Student    Score
1     Doe           500
2     Doe            5
3     Doe           99
4     Doe           100
5     Doe           95
6     Doe           98
7     Doe           100
8     Smith         89
9     Smith         95
10   Smith         92
11   etc.

it is clear that Doe generally earned about 98%, but once had a strange 500% and a strange 5% (I want the query to exclude or ignore the 500 and and the 5).  
    1. Considering that other students besides Doe are in the table, what query could select only the rows where the student is Doe, while excluding Doe's outliers (the row with the highest value *and* the row with the lowest value)?  
   2.  What if I want to exclude only the row with the highest value?
0
Comment
Question by:Randall-B
  • 4
  • 2
6 Comments
 
LVL 5

Accepted Solution

by:
hkamal earned 250 total points
ID: 18765006
Let's call your table "myTable". You could write the query like this:

SELECT ID, Student, Score
FROM   myTable t1
WHERE  t1.Score != (SELECT MAX(Score) FROM myTable t2 WHERE t2.ID=t1.ID)
AND      t1.Score != (SELECT MIN(Score) FROM myTable t2 WHERE t2.ID=t1.ID)
AND      t1.Student =  "Doe"
To exclude outliers for all students, simple remove the last line
To remove only the top outlier, remove the second "AND" line
Remember that you are only ever excluding the top and bottom scores. If you had a 500% and a 498% you would still see the 498%. You may choose to alter the logic to eliminate, say, anything over 100% ..

Hope this helps
0
 

Author Comment

by:Randall-B
ID: 18765392
hkamal,
   Thanks.  My table is named "scores".  In PHP, I get the full result set when using a simple query like this:

    $sql = 'SELECT ID, Student, Score FROM scores WHERE Student = "Doe" ORDER BY Score ASC';
   $res = mysql_query($sql);

but I'm not getting anything (no results at all) when using the exclude-outliers query like this:

   $sql = 'SELECT ID, Student, Score FROM scores t1 WHERE  t1.Score != (SELECT MAX(Score) FROM scores t2 WHERE t2.ID=t1.ID) AND t1.Score != (SELECT MIN(Score) FROM scores t2 WHERE t2.ID=t1.ID) AND t1.Student = "Doe" ORDER BY Score ASC';
    $res = mysql_query($sql);

What could be wrong?
0
 

Author Comment

by:Randall-B
ID: 18765592
Should it be like this?

$sql = 'SELECT ID, Student, Score FROM scores t1 WHERE t1.Student = "Doe" AND t1.Score != (SELECT MAX(Score) FROM scores t2 WHERE t2.Student="Doe") AND t1.Score != (SELECT MIN(Score) FROM scores t2 WHERE t2.Student="Doe") ORDER BY Score ASC';
0
Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

 

Author Comment

by:Randall-B
ID: 18765933
Also, please see my related question at http:/Q_22463806.html , where I'm asking how to exclude the *two* highest outliers (without any arbitrary cutoffs like "100%"). Thanks.
0
 

Author Comment

by:Randall-B
ID: 18767078
I'm accepting your answer with the understanding that the query should be:

$sql = 'SELECT ID, Student, Score FROM scores t1 WHERE t1.Student = "Doe" AND t1.Score != (SELECT MAX(Score) FROM scores t2 WHERE t2.Student="Doe") AND t1.Score != (SELECT MIN(Score) FROM scores t2 WHERE t2.Student="Doe") ORDER BY Score ASC';

Thanks.
0
 
LVL 5

Expert Comment

by:hkamal
ID: 18770144
Hi Randall-B,
I made the wrong assumption that ID was unique per Student, but I can see it is not. In which case, you can use a modified version of my query, viz:
SELECT ID, Student, Score
FROM   myTable t1
WHERE  t1.Score != (SELECT MAX(Score) FROM myTable t2 WHERE t2.Student=t1.Student)
AND      t1.Score != (SELECT MIN(Score) FROM myTable t2 WHERE t2.Student=t1.Student)
AND      t1.Student =  "Doe"
This was, all mty above statements still apply and you can still use the query for any student as long as you simply change the last line
I'll take a look at your other query
0

Featured Post

Use Case: Protecting a Hybrid Cloud Infrastructure

Microsoft Azure is rapidly becoming the norm in dynamic IT environments. This document describes the challenges that organizations face when protecting data in a hybrid cloud IT environment and presents a use case to demonstrate how Acronis Backup protects all data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Foreword In the years since this article was written, numerous hacking attacks have targeted password-protected web sites.  The storage of client passwords has become a subject of much discussion, some of it useful and some of it misguided.  Of cou…
A lot of articles have been written on splitting mysqldump and grabbing the required tables. A long while back, when Shlomi (http://code.openark.org/blog/mysql/on-restoring-a-single-table-from-mysqldump) had suggested a “sed” way, I actually shell …
Two types of users will appreciate AOMEI Backupper Pro: 1 - Those with PCIe drives (and haven't found cloning software that works on them). 2 - Those who want a fast clone of their boot drive (no re-boots needed) and it can clone your drive wh…
This video shows how to use Hyena, from SystemTools Software, to bulk import 100 user accounts from an external text file. View in 1080p for best video quality.

785 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question