Solved

Exclude erroneous values using Standard Deviation

Posted on 2006-10-25
4
642 Views
Last Modified: 2008-03-04
I have a query running from within Excel based on a view in SQL Server. It returns up to 60000 records (but rarely over a few hundred) into the Excel spreadsheet. The data is source data for a chart. The GAM, MAM, and TM columns are values, and these end up being different series in the chart. The problem occurs when an erroneous number ends up in the data - e.g. let's say the normal range of values is between 100 and 500 - then somewhere due to some strange reason, a value of 12,000 is returned. This skews the chart and analysis.

I have been asked to focus in on 90% of the central values, and the outlying 10% of values are to be excluded.

Question:
======
1. How do I do this - is it a Standard Deviation Job
2. What is the SQL I need to get the data I want.

Background INFO:
===========
----SQL Server View (FreightDataView)--------
CREATE VIEW dbo.FreightDataView
AS
SELECT     InvoiceYearMonth, OrigCountry AS Orig, DestCountry AS Dest, Weight, ProdCode, (CASE AccountType WHEN 'G' THEN NetRev ELSE 0 END) AS GAM,
                      (CASE AccountType WHEN 'MAM' THEN NetRev ELSE 0 END) AS MAM, (CASE AccountType WHEN 'SME' THEN NetRev ELSE 0 END) AS TM
FROM         FreightData
WHERE     (NOT (weight = 0))
----------------------------------------------------

----Excel Query------------
select top 60000 Orig, Dest, ProdCode, Weight, GAM, MAM, TM
from FreightDataView
where (blah blah blah...)
----------------------------

Cheers,

LoveToSpod
0
Comment
Question by:LoveToSpod
  • 2
4 Comments
 
LVL 92

Expert Comment

by:Patrick Matthews
ID: 17802791
Hi LoveToSpod,
> I have been asked to focus in on 90% of the central values, and the outlying 10%
> of values are to be excluded.

So what does that mean: chop off 5% at each tail of the distribution?

Regards,

Patrick
0
 
LVL 11

Accepted Solution

by:
regbes earned 500 total points
ID: 17802839
Hi LoveToSpod,
try one of these

CREATE VIEW dbo.FreightDataView
AS
SELECT     InvoiceYearMonth, OrigCountry AS Orig, DestCountry AS Dest, Weight, ProdCode, (CASE AccountType WHEN 'G' THEN NetRev ELSE 0 END) AS GAM,
                      (CASE AccountType WHEN 'MAM' THEN NetRev ELSE 0 END) AS MAM, (CASE AccountType WHEN 'SME' THEN NetRev ELSE 0 END) AS TM
FROM         FreightData
WHERE     (NOT (weight = 0))
and NetRev < 1000

or

CREATE VIEW dbo.FreightDataView
AS
SELECT     InvoiceYearMonth, OrigCountry AS Orig, DestCountry AS Dest, Weight, ProdCode, (CASE AccountType WHEN 'G' THEN NetRev ELSE 0 END) AS GAM,
                      (CASE AccountType WHEN 'MAM' THEN NetRev ELSE 0 END) AS MAM, (CASE AccountType WHEN 'SME' THEN NetRev ELSE 0 END) AS TM
FROM         FreightData
WHERE     (NOT (weight = 0))

and NetRev <  NetRev + (select stddev(NetRev) - Avg(netrev) from freightdata WHERE   (NOT (weight = 0)))


HTH

R.
0
 

Author Comment

by:LoveToSpod
ID: 17803004
Hi matthewspatrick

We could lop-off top and/or bottom 5% of values, but looking at it closer, I would like to add user controls that configure how much is lopped-off either side of the data, therefore the '5%' becomes a variable.

Cheers, LTS
0
 

Author Comment

by:LoveToSpod
ID: 17980454
Hi

I simply added a manual range into the SQL. This allows the user to eliminate any 'outside' data that skew the analysis unnecessarily.

Thx
0

Featured Post

VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Having an SQL database can be a big investment for a small company. Hardware, setup and of course, the price of software all add up to a big bill that some companies may not be able to absorb.  Luckily, there is a free version SQL Express, but does …
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.

831 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question