?
Solved

Exclude erroneous values using Standard Deviation

Posted on 2006-10-25
4
Medium Priority
?
651 Views
Last Modified: 2008-03-04
I have a query running from within Excel based on a view in SQL Server. It returns up to 60000 records (but rarely over a few hundred) into the Excel spreadsheet. The data is source data for a chart. The GAM, MAM, and TM columns are values, and these end up being different series in the chart. The problem occurs when an erroneous number ends up in the data - e.g. let's say the normal range of values is between 100 and 500 - then somewhere due to some strange reason, a value of 12,000 is returned. This skews the chart and analysis.

I have been asked to focus in on 90% of the central values, and the outlying 10% of values are to be excluded.

Question:
======
1. How do I do this - is it a Standard Deviation Job
2. What is the SQL I need to get the data I want.

Background INFO:
===========
----SQL Server View (FreightDataView)--------
CREATE VIEW dbo.FreightDataView
AS
SELECT     InvoiceYearMonth, OrigCountry AS Orig, DestCountry AS Dest, Weight, ProdCode, (CASE AccountType WHEN 'G' THEN NetRev ELSE 0 END) AS GAM,
                      (CASE AccountType WHEN 'MAM' THEN NetRev ELSE 0 END) AS MAM, (CASE AccountType WHEN 'SME' THEN NetRev ELSE 0 END) AS TM
FROM         FreightData
WHERE     (NOT (weight = 0))
----------------------------------------------------

----Excel Query------------
select top 60000 Orig, Dest, ProdCode, Weight, GAM, MAM, TM
from FreightDataView
where (blah blah blah...)
----------------------------

Cheers,

LoveToSpod
0
Comment
Question by:LoveToSpod
  • 2
4 Comments
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 17802791
Hi LoveToSpod,
> I have been asked to focus in on 90% of the central values, and the outlying 10%
> of values are to be excluded.

So what does that mean: chop off 5% at each tail of the distribution?

Regards,

Patrick
0
 
LVL 11

Accepted Solution

by:
regbes earned 2000 total points
ID: 17802839
Hi LoveToSpod,
try one of these

CREATE VIEW dbo.FreightDataView
AS
SELECT     InvoiceYearMonth, OrigCountry AS Orig, DestCountry AS Dest, Weight, ProdCode, (CASE AccountType WHEN 'G' THEN NetRev ELSE 0 END) AS GAM,
                      (CASE AccountType WHEN 'MAM' THEN NetRev ELSE 0 END) AS MAM, (CASE AccountType WHEN 'SME' THEN NetRev ELSE 0 END) AS TM
FROM         FreightData
WHERE     (NOT (weight = 0))
and NetRev < 1000

or

CREATE VIEW dbo.FreightDataView
AS
SELECT     InvoiceYearMonth, OrigCountry AS Orig, DestCountry AS Dest, Weight, ProdCode, (CASE AccountType WHEN 'G' THEN NetRev ELSE 0 END) AS GAM,
                      (CASE AccountType WHEN 'MAM' THEN NetRev ELSE 0 END) AS MAM, (CASE AccountType WHEN 'SME' THEN NetRev ELSE 0 END) AS TM
FROM         FreightData
WHERE     (NOT (weight = 0))

and NetRev <  NetRev + (select stddev(NetRev) - Avg(netrev) from freightdata WHERE   (NOT (weight = 0)))


HTH

R.
0
 

Author Comment

by:LoveToSpod
ID: 17803004
Hi matthewspatrick

We could lop-off top and/or bottom 5% of values, but looking at it closer, I would like to add user controls that configure how much is lopped-off either side of the data, therefore the '5%' becomes a variable.

Cheers, LTS
0
 

Author Comment

by:LoveToSpod
ID: 17980454
Hi

I simply added a manual range into the SQL. This allows the user to eliminate any 'outside' data that skew the analysis unnecessarily.

Thx
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

One of the most important things in an application is the query performance. This article intends to give you good tips to improve the performance of your queries.
This month, Experts Exchange sat down with resident SQL expert, Jim Horn, for an in-depth look into the makings of a successful career in SQL.
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Via a live example, show how to extract insert data into a SQL Server database table using the Import/Export option and Bulk Insert.
Suggested Courses

621 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question