Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Exclude erroneous values using Standard Deviation

Posted on 2006-10-25
4
Medium Priority
?
648 Views
Last Modified: 2008-03-04
I have a query running from within Excel based on a view in SQL Server. It returns up to 60000 records (but rarely over a few hundred) into the Excel spreadsheet. The data is source data for a chart. The GAM, MAM, and TM columns are values, and these end up being different series in the chart. The problem occurs when an erroneous number ends up in the data - e.g. let's say the normal range of values is between 100 and 500 - then somewhere due to some strange reason, a value of 12,000 is returned. This skews the chart and analysis.

I have been asked to focus in on 90% of the central values, and the outlying 10% of values are to be excluded.

Question:
======
1. How do I do this - is it a Standard Deviation Job
2. What is the SQL I need to get the data I want.

Background INFO:
===========
----SQL Server View (FreightDataView)--------
CREATE VIEW dbo.FreightDataView
AS
SELECT     InvoiceYearMonth, OrigCountry AS Orig, DestCountry AS Dest, Weight, ProdCode, (CASE AccountType WHEN 'G' THEN NetRev ELSE 0 END) AS GAM,
                      (CASE AccountType WHEN 'MAM' THEN NetRev ELSE 0 END) AS MAM, (CASE AccountType WHEN 'SME' THEN NetRev ELSE 0 END) AS TM
FROM         FreightData
WHERE     (NOT (weight = 0))
----------------------------------------------------

----Excel Query------------
select top 60000 Orig, Dest, ProdCode, Weight, GAM, MAM, TM
from FreightDataView
where (blah blah blah...)
----------------------------

Cheers,

LoveToSpod
0
Comment
Question by:LoveToSpod
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 93

Expert Comment

by:Patrick Matthews
ID: 17802791
Hi LoveToSpod,
> I have been asked to focus in on 90% of the central values, and the outlying 10%
> of values are to be excluded.

So what does that mean: chop off 5% at each tail of the distribution?

Regards,

Patrick
0
 
LVL 11

Accepted Solution

by:
regbes earned 2000 total points
ID: 17802839
Hi LoveToSpod,
try one of these

CREATE VIEW dbo.FreightDataView
AS
SELECT     InvoiceYearMonth, OrigCountry AS Orig, DestCountry AS Dest, Weight, ProdCode, (CASE AccountType WHEN 'G' THEN NetRev ELSE 0 END) AS GAM,
                      (CASE AccountType WHEN 'MAM' THEN NetRev ELSE 0 END) AS MAM, (CASE AccountType WHEN 'SME' THEN NetRev ELSE 0 END) AS TM
FROM         FreightData
WHERE     (NOT (weight = 0))
and NetRev < 1000

or

CREATE VIEW dbo.FreightDataView
AS
SELECT     InvoiceYearMonth, OrigCountry AS Orig, DestCountry AS Dest, Weight, ProdCode, (CASE AccountType WHEN 'G' THEN NetRev ELSE 0 END) AS GAM,
                      (CASE AccountType WHEN 'MAM' THEN NetRev ELSE 0 END) AS MAM, (CASE AccountType WHEN 'SME' THEN NetRev ELSE 0 END) AS TM
FROM         FreightData
WHERE     (NOT (weight = 0))

and NetRev <  NetRev + (select stddev(NetRev) - Avg(netrev) from freightdata WHERE   (NOT (weight = 0)))


HTH

R.
0
 

Author Comment

by:LoveToSpod
ID: 17803004
Hi matthewspatrick

We could lop-off top and/or bottom 5% of values, but looking at it closer, I would like to add user controls that configure how much is lopped-off either side of the data, therefore the '5%' becomes a variable.

Cheers, LTS
0
 

Author Comment

by:LoveToSpod
ID: 17980454
Hi

I simply added a manual range into the SQL. This allows the user to eliminate any 'outside' data that skew the analysis unnecessarily.

Thx
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It is possible to export the data of a SQL Table in SSMS and generate INSERT statements. It's neatly tucked away in the generate scripts option of a database.
This month, Experts Exchange sat down with resident SQL expert, Jim Horn, for an in-depth look into the makings of a successful career in SQL.
Familiarize people with the process of utilizing SQL Server functions from within Microsoft Access. Microsoft Access is a very powerful client/server development tool. One of the SQL Server objects that you can interact with from within Microsoft Ac…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

604 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question