Solved

Query to Exclude Outliers from Selected Rows

Posted on 2007-03-21
6
1,136 Views
Last Modified: 2012-05-05
I want to exclude outliers (the highest value and the lowest value) from a query of numerical data.  For example, if the table is like this:

ID   Student    Score
1     Doe           500
2     Doe            5
3     Doe           99
4     Doe           100
5     Doe           95
6     Doe           98
7     Doe           100
8     Smith         89
9     Smith         95
10   Smith         92
11   etc.

it is clear that Doe generally earned about 98%, but once had a strange 500% and a strange 5% (I want the query to exclude or ignore the 500 and and the 5).  
    1. Considering that other students besides Doe are in the table, what query could select only the rows where the student is Doe, while excluding Doe's outliers (the row with the highest value *and* the row with the lowest value)?  
   2.  What if I want to exclude only the row with the highest value?
0
Comment
Question by:Randall-B
  • 4
  • 2
6 Comments
 
LVL 5

Accepted Solution

by:
hkamal earned 250 total points
ID: 18765006
Let's call your table "myTable". You could write the query like this:

SELECT ID, Student, Score
FROM   myTable t1
WHERE  t1.Score != (SELECT MAX(Score) FROM myTable t2 WHERE t2.ID=t1.ID)
AND      t1.Score != (SELECT MIN(Score) FROM myTable t2 WHERE t2.ID=t1.ID)
AND      t1.Student =  "Doe"
To exclude outliers for all students, simple remove the last line
To remove only the top outlier, remove the second "AND" line
Remember that you are only ever excluding the top and bottom scores. If you had a 500% and a 498% you would still see the 498%. You may choose to alter the logic to eliminate, say, anything over 100% ..

Hope this helps
0
 

Author Comment

by:Randall-B
ID: 18765392
hkamal,
   Thanks.  My table is named "scores".  In PHP, I get the full result set when using a simple query like this:

    $sql = 'SELECT ID, Student, Score FROM scores WHERE Student = "Doe" ORDER BY Score ASC';
   $res = mysql_query($sql);

but I'm not getting anything (no results at all) when using the exclude-outliers query like this:

   $sql = 'SELECT ID, Student, Score FROM scores t1 WHERE  t1.Score != (SELECT MAX(Score) FROM scores t2 WHERE t2.ID=t1.ID) AND t1.Score != (SELECT MIN(Score) FROM scores t2 WHERE t2.ID=t1.ID) AND t1.Student = "Doe" ORDER BY Score ASC';
    $res = mysql_query($sql);

What could be wrong?
0
 

Author Comment

by:Randall-B
ID: 18765592
Should it be like this?

$sql = 'SELECT ID, Student, Score FROM scores t1 WHERE t1.Student = "Doe" AND t1.Score != (SELECT MAX(Score) FROM scores t2 WHERE t2.Student="Doe") AND t1.Score != (SELECT MIN(Score) FROM scores t2 WHERE t2.Student="Doe") ORDER BY Score ASC';
0
Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

 

Author Comment

by:Randall-B
ID: 18765933
Also, please see my related question at http:/Q_22463806.html , where I'm asking how to exclude the *two* highest outliers (without any arbitrary cutoffs like "100%"). Thanks.
0
 

Author Comment

by:Randall-B
ID: 18767078
I'm accepting your answer with the understanding that the query should be:

$sql = 'SELECT ID, Student, Score FROM scores t1 WHERE t1.Student = "Doe" AND t1.Score != (SELECT MAX(Score) FROM scores t2 WHERE t2.Student="Doe") AND t1.Score != (SELECT MIN(Score) FROM scores t2 WHERE t2.Student="Doe") ORDER BY Score ASC';

Thanks.
0
 
LVL 5

Expert Comment

by:hkamal
ID: 18770144
Hi Randall-B,
I made the wrong assumption that ID was unique per Student, but I can see it is not. In which case, you can use a modified version of my query, viz:
SELECT ID, Student, Score
FROM   myTable t1
WHERE  t1.Score != (SELECT MAX(Score) FROM myTable t2 WHERE t2.Student=t1.Student)
AND      t1.Score != (SELECT MIN(Score) FROM myTable t2 WHERE t2.Student=t1.Student)
AND      t1.Student =  "Doe"
This was, all mty above statements still apply and you can still use the query for any student as long as you simply change the last line
I'll take a look at your other query
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Fore-Foreword Today (2016) Maxmind has a new approach to the distribution of its data sets.  This article may be obsolete.  Instead of using the examples here, have a look at the MaxMind API (https://www.maxmind.com/en/geolite2-developer-package). …
As a database administrator, you may need to audit your table(s) to determine whether the data types are optimal for your real-world data needs.  This Article is intended to be a resource for such a task. Preface The other day, I was involved …
Along with being a a promotional video for my three-day Annielytics Dashboard Seminor, this Micro Tutorial is an intro to Google Analytics API data.
Internet Business Fax to Email Made Easy - With  eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, f…

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now