Solved

Detect outliers in a series of numbers

Posted on 2011-03-04
6
1,907 Views
Last Modified: 2012-05-11
Suppose I have a series of numbers like this

-0.0289436618082665
-0.0322635297824615
0.0473380547993016
-0.0483053616147235
0.0561386651052217
-0.0546202231192121
3912478746624.73
-0.0570958411471398
-0.0406567550991673
-0.0191101260081410
-0.0178598058749180
5912378756654.12
-0.00649518615382946
-0.0569007033673227
0.00634860933789683

So you most are within a certain range, say +/- 1.0, but there a couple numbers way out of this range.
Is there an algorithm to determine which numbers are these outliers?
I was thinking if I could detect these, then I could calculate the mean of the non-outliers and replace the outliers with that.
0
Comment
Question by:allelopath
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 
LVL 18

Accepted Solution

by:
deighton earned 84 total points
ID: 35038303
yes you calculate the mean and standard deviation of the numbers.

then for each number you calculate the number of standard deviations from the mean

if the number is more than 4 standard deviations from the mean, it can be considred an outsider.

I see your out-lying values are extremely outside the range of the others.
0
 
LVL 37

Assisted Solution

by:TommySzalapski
TommySzalapski earned 167 total points
ID: 35038618
if the number is more than 4 standard deviations from the mean, it can be considred an outsider.

This threshold for what makes it an outlier is really application dependent. In fact, for the data you posted, one of the obvious outliers is less than 3 standard deviations from the mean. Many applications would throw out the top and bottom 5-10% of the data before doing any caclulations. If you do that, then your outliers will be over a billion standard deviations from the mean and would be outliers by almost any standard.
0
 
LVL 27

Assisted Solution

by:aburr
aburr earned 83 total points
ID: 35039179
All this outliers business is very fraught with danger.
Physics is full of stories about people ignoring outliers and missing Nobel prizes.
Nevertheless people find it useful to establish algorithms to spot outliers. There is no standard algorithm to which objections cannot be raised.
Several popular ones have been given above.
Obviously if you run your data through whatever algorithm you choose enough times you will end up with one data point. You should not discard any data point without a non-statistical cause. Nevertheless often the problem is not important enough to spend a lot of time on it so one of the algorithms mentioned above will be usually an improvement in the decision making process.
0
Transaction Monitoring Vs. Real User Monitoring

Synthetic Transaction Monitoring Vs. Real User Monitoring: When To Use Each Approach? In this article, we will discuss two major monitoring approaches: Synthetic Transaction and Real User Monitoring.

 
LVL 32

Assisted Solution

by:phoffric
phoffric earned 166 total points
ID: 35041453
I just wanted to remind you that the mean and standard deviation includes the outliers and depending upon the quantity and magnitude of these outliers, these values could be adversely skewed. Depending on your model, the mean and standard deviation may be just what you need.

You may also want to consider determining the median instead. And if you can define from your model what an outlier is, then you might consider a % threshold error (or an absolute threshold error value - depends on your model), so that if the absolute value of the difference between the median and the data point exceeds the threshold, then that point will be considered an outlier.

As others have already alluded, you need to understand your model in order to define what an outlier is.
0
 
LVL 32

Assisted Solution

by:phoffric
phoffric earned 166 total points
ID: 35041472
Here are some EE discussions on outliers. Again, make sure that the question in the OP fits your model before applying any points made:

     http://rdsrc.us/GyvkW7

     http://rdsrc.us/U4sDl9

     http://rdsrc.us/VrQc4e
     
0
 
LVL 37

Assisted Solution

by:TommySzalapski
TommySzalapski earned 167 total points
ID: 35044851
Another common thing to do in outlier detection is to consider the mean and standard deviation (sd) of all points but the one in question. This tends to avoid the problem of one massive outlier messing up the statistics.
The best way to do that is to get the mean and sd for the whole set and 'remove' the one in question.
If you have N data points and want to remove x then you just do newmean = (mean*n-x)/(n-1) and that gives you the mean without considering x.
For the sd, remember that the mean of the x^2 minus the mean^2 gives the sd, so if you keep track of the mean of the squares, you can do the same thing for sd.
0

Featured Post

The Ultimate Checklist to Optimize Your Website

Websites are getting bigger and complicated by the day. Video, images, custom fonts are all great for showcasing your product/service. But the price to pay in terms of reduced page load times and ultimately, decreased sales, can lead to some difficult decisions about what to cut.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Whether you’re a college noob or a soon-to-be pro, these tips are sure to help you in your journey to becoming a programming ninja and stand out from the crowd.
This article will inform Clients about common and important expectations from the freelancers (Experts) who are looking at your Gig.
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
Although Jacob Bernoulli (1654-1705) has been credited as the creator of "Binomial Distribution Table", Gottfried Leibniz (1646-1716) did his dissertation on the subject in 1666; Leibniz you may recall is the co-inventor of "Calculus" and beat Isaac…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question