Link to home
Start Free TrialLog in
Avatar of Andrew Luedke
Andrew Luedke

asked on

Identify outliers through SQL

Hi- I have a large recordset of about 300,000 records and am looking to capture the outliers of a column within SQL.  

For the sake of simplicity, let's call my table mainTable and the data in question pData.

I understand that I'll need to capture the quartiles, but am not quite sure how to go about doing this.

Thanks in advance.
Avatar of AndyAinscow
AndyAinscow
Flag of Switzerland image

You could display the max and min values.  That will give you the range and then do a select query with >= or <= against the values you want.
SELECT MAX(x) as MaxX, MIN(x) as MinX FROM tbl1
then eg.
SELECT * FROM tbl1 WHERE (x >= (MaxX-5))
and
SELECT * FROM tbl1 WHERE (x <= (MinX+5))
ps.  Exactly what you define as an 'outlier' is something you will have to decide.
Avatar of Andrew Luedke
Andrew Luedke

ASKER

In terms of outliers, I'd like to grab the 1% and 99% percentiles to capture the extreme values.  Does this help to refine the algorithm?
Do you mean the 300 highest and 300 lowest or those higher than (lowest + 0.99*range)
Apologies for the lack of clarity here.  The numbers could vary.

The formula should determine the general distribution.  Meaning that if you have a range of numbers, the formula should determine the thresholds and capture 1% of values below the normal range and the other 1% of values above.  This way, you can dynamically captures the extreme highs and lows of a set.  

For example, let's say we have a 200,000 numbers with the following characteristics:

Min:
-100

Max:
1.09

Avg:
.2

STDev:
.5

How do we go about capturing those outliers within the set?
Do you want everything done in SQL or can you perform some calculations outside of the SQL to determine the limits which you then feed back into an SQL select query?
Given the size of the database, it would be best to try and accomplish everything via SQL.
ASKER CERTIFIED SOLUTION
Avatar of PortletPaul
PortletPaul
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Andrew, you still have the issue or it's already solved?