Solved

Exclude erroneous values using Standard Deviation

Posted on 2006-10-25
4
646 Views
Last Modified: 2008-03-04
I have a query running from within Excel based on a view in SQL Server. It returns up to 60000 records (but rarely over a few hundred) into the Excel spreadsheet. The data is source data for a chart. The GAM, MAM, and TM columns are values, and these end up being different series in the chart. The problem occurs when an erroneous number ends up in the data - e.g. let's say the normal range of values is between 100 and 500 - then somewhere due to some strange reason, a value of 12,000 is returned. This skews the chart and analysis.

I have been asked to focus in on 90% of the central values, and the outlying 10% of values are to be excluded.

Question:
======
1. How do I do this - is it a Standard Deviation Job
2. What is the SQL I need to get the data I want.

Background INFO:
===========
----SQL Server View (FreightDataView)--------
CREATE VIEW dbo.FreightDataView
AS
SELECT     InvoiceYearMonth, OrigCountry AS Orig, DestCountry AS Dest, Weight, ProdCode, (CASE AccountType WHEN 'G' THEN NetRev ELSE 0 END) AS GAM,
                      (CASE AccountType WHEN 'MAM' THEN NetRev ELSE 0 END) AS MAM, (CASE AccountType WHEN 'SME' THEN NetRev ELSE 0 END) AS TM
FROM         FreightData
WHERE     (NOT (weight = 0))
----------------------------------------------------

----Excel Query------------
select top 60000 Orig, Dest, ProdCode, Weight, GAM, MAM, TM
from FreightDataView
where (blah blah blah...)
----------------------------

Cheers,

LoveToSpod
0
Comment
Question by:LoveToSpod
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 17802791
Hi LoveToSpod,
> I have been asked to focus in on 90% of the central values, and the outlying 10%
> of values are to be excluded.

So what does that mean: chop off 5% at each tail of the distribution?

Regards,

Patrick
0
 
LVL 11

Accepted Solution

by:
regbes earned 500 total points
ID: 17802839
Hi LoveToSpod,
try one of these

CREATE VIEW dbo.FreightDataView
AS
SELECT     InvoiceYearMonth, OrigCountry AS Orig, DestCountry AS Dest, Weight, ProdCode, (CASE AccountType WHEN 'G' THEN NetRev ELSE 0 END) AS GAM,
                      (CASE AccountType WHEN 'MAM' THEN NetRev ELSE 0 END) AS MAM, (CASE AccountType WHEN 'SME' THEN NetRev ELSE 0 END) AS TM
FROM         FreightData
WHERE     (NOT (weight = 0))
and NetRev < 1000

or

CREATE VIEW dbo.FreightDataView
AS
SELECT     InvoiceYearMonth, OrigCountry AS Orig, DestCountry AS Dest, Weight, ProdCode, (CASE AccountType WHEN 'G' THEN NetRev ELSE 0 END) AS GAM,
                      (CASE AccountType WHEN 'MAM' THEN NetRev ELSE 0 END) AS MAM, (CASE AccountType WHEN 'SME' THEN NetRev ELSE 0 END) AS TM
FROM         FreightData
WHERE     (NOT (weight = 0))

and NetRev <  NetRev + (select stddev(NetRev) - Avg(netrev) from freightdata WHERE   (NOT (weight = 0)))


HTH

R.
0
 

Author Comment

by:LoveToSpod
ID: 17803004
Hi matthewspatrick

We could lop-off top and/or bottom 5% of values, but looking at it closer, I would like to add user controls that configure how much is lopped-off either side of the data, therefore the '5%' becomes a variable.

Cheers, LTS
0
 

Author Comment

by:LoveToSpod
ID: 17980454
Hi

I simply added a manual range into the SQL. This allows the user to eliminate any 'outside' data that skew the analysis unnecessarily.

Thx
0

Featured Post

Get Actionable Data from Your Monitoring Solution

Your communication platform is only as good as the relevance of the information you send. Ensure your alerts get to the right people every time with actionable responses. Create escalation rules that ensure everyone follows the process and nothing is left to chance.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
In the first part of this tutorial we will cover the prerequisites for installing SQL Server vNext on Linux.
Via a live example, show how to set up a backup for SQL Server using a Maintenance Plan and how to schedule the job into SQL Server Agent.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

691 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question