Link to home
Start Free TrialLog in
Avatar of HKFuey
HKFueyFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Removing Outliers in a SQL Server Query

I have a view, one column averages the last 3 months sales. Sometimes the average is skewed by large values I would like to ignore.

e.g. if I have sales of 10 for month 1, 10 for month 2, and 100 for month 3, is there any way of ignoring the large number that is skewing the average? If so does anyone know the syntax?
SOLUTION
Avatar of Lee
Lee
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You can use a case statement in the AVG function as below:
SELECT yourcolumns, 
      AVG (CASE WHEN SALES < 100000 THEN SALES ELSE NULL END) 
FROM SALES TABLE
GROUP BY yourcolumns

Open in new window

ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well, again I live and learn. I didn't know SQL had a standard deviation function!

Thanks for the heads up!
Avatar of HKFuey

ASKER

Still not sure of syntax: -

STDEV((SalesQty1 + SalesQty2 + SalesQty3) / 3)

Returns null
Make sure that there are no nulls in SalesQty1, 2 or 3 and then divide by 3.0

STDEV((SalesQty1 + SalesQty2 + SalesQty3) / 3.0)
if any of the 3 values is null, this would be normal, and anyhow this shall be done set-wise, and not per row ...

again: what is the rule/threshold to consider 1 value to be "ignored" compared to the 2 others?
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of HKFuey

ASKER

Thanks for the help.