Solved

Query to Exclude Outliers from Selected Rows

Posted on 2007-03-21
6
1,188 Views
Last Modified: 2012-05-05
I want to exclude outliers (the highest value and the lowest value) from a query of numerical data.  For example, if the table is like this:

ID   Student    Score
1     Doe           500
2     Doe            5
3     Doe           99
4     Doe           100
5     Doe           95
6     Doe           98
7     Doe           100
8     Smith         89
9     Smith         95
10   Smith         92
11   etc.

it is clear that Doe generally earned about 98%, but once had a strange 500% and a strange 5% (I want the query to exclude or ignore the 500 and and the 5).  
    1. Considering that other students besides Doe are in the table, what query could select only the rows where the student is Doe, while excluding Doe's outliers (the row with the highest value *and* the row with the lowest value)?  
   2.  What if I want to exclude only the row with the highest value?
0
Comment
Question by:Randall-B
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
6 Comments
 
LVL 5

Accepted Solution

by:
hkamal earned 250 total points
ID: 18765006
Let's call your table "myTable". You could write the query like this:

SELECT ID, Student, Score
FROM   myTable t1
WHERE  t1.Score != (SELECT MAX(Score) FROM myTable t2 WHERE t2.ID=t1.ID)
AND      t1.Score != (SELECT MIN(Score) FROM myTable t2 WHERE t2.ID=t1.ID)
AND      t1.Student =  "Doe"
To exclude outliers for all students, simple remove the last line
To remove only the top outlier, remove the second "AND" line
Remember that you are only ever excluding the top and bottom scores. If you had a 500% and a 498% you would still see the 498%. You may choose to alter the logic to eliminate, say, anything over 100% ..

Hope this helps
0
 

Author Comment

by:Randall-B
ID: 18765392
hkamal,
   Thanks.  My table is named "scores".  In PHP, I get the full result set when using a simple query like this:

    $sql = 'SELECT ID, Student, Score FROM scores WHERE Student = "Doe" ORDER BY Score ASC';
   $res = mysql_query($sql);

but I'm not getting anything (no results at all) when using the exclude-outliers query like this:

   $sql = 'SELECT ID, Student, Score FROM scores t1 WHERE  t1.Score != (SELECT MAX(Score) FROM scores t2 WHERE t2.ID=t1.ID) AND t1.Score != (SELECT MIN(Score) FROM scores t2 WHERE t2.ID=t1.ID) AND t1.Student = "Doe" ORDER BY Score ASC';
    $res = mysql_query($sql);

What could be wrong?
0
 

Author Comment

by:Randall-B
ID: 18765592
Should it be like this?

$sql = 'SELECT ID, Student, Score FROM scores t1 WHERE t1.Student = "Doe" AND t1.Score != (SELECT MAX(Score) FROM scores t2 WHERE t2.Student="Doe") AND t1.Score != (SELECT MIN(Score) FROM scores t2 WHERE t2.Student="Doe") ORDER BY Score ASC';
0
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

 

Author Comment

by:Randall-B
ID: 18765933
Also, please see my related question at http:/Q_22463806.html , where I'm asking how to exclude the *two* highest outliers (without any arbitrary cutoffs like "100%"). Thanks.
0
 

Author Comment

by:Randall-B
ID: 18767078
I'm accepting your answer with the understanding that the query should be:

$sql = 'SELECT ID, Student, Score FROM scores t1 WHERE t1.Student = "Doe" AND t1.Score != (SELECT MAX(Score) FROM scores t2 WHERE t2.Student="Doe") AND t1.Score != (SELECT MIN(Score) FROM scores t2 WHERE t2.Student="Doe") ORDER BY Score ASC';

Thanks.
0
 
LVL 5

Expert Comment

by:hkamal
ID: 18770144
Hi Randall-B,
I made the wrong assumption that ID was unique per Student, but I can see it is not. In which case, you can use a modified version of my query, viz:
SELECT ID, Student, Score
FROM   myTable t1
WHERE  t1.Score != (SELECT MAX(Score) FROM myTable t2 WHERE t2.Student=t1.Student)
AND      t1.Score != (SELECT MIN(Score) FROM myTable t2 WHERE t2.Student=t1.Student)
AND      t1.Student =  "Doe"
This was, all mty above statements still apply and you can still use the query for any student as long as you simply change the last line
I'll take a look at your other query
0

Featured Post

Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Foreword This is an old article.  Instead of using the MySQL extension that was used in the original code examples, please choose one of the currently supported database extensions instead.  More information is available here: MySQLi / PDO (http://…
This post contains step-by-step instructions for setting up alerting in Percona Monitoring and Management (PMM) using Grafana.
NetCrunch network monitor is a highly extensive platform for network monitoring and alert generation. In this video you'll see a live demo of NetCrunch with most notable features explained in a walk-through manner. You'll also get to know the philos…
If you’ve ever visited a web page and noticed a cool font that you really liked the look of, but couldn’t figure out which font it was so that you could use it for your own work, then this video is for you! In this Micro Tutorial, you'll learn yo…

623 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question