Link to home
Start Free TrialLog in
Avatar of Adam Setzler
Adam SetzlerFlag for United States of America

asked on

Simple function for detecting outliers in an array of numbers?

This is a PHP statistics question, given that PHP doesn't have a good native set of math classes.

Say I have a set of numbers from 1 to 100, and this array could contain upwards 100's of values, how would I best go about filtering out outliers?

Example: 12, 14, 80, 89, 85, 84, 81, 80, 78, 84

How would I eliminate 12 and 14?
Avatar of hielo
hielo
Flag of Wallis and Futuna image

How are you getting the data into the array? The key is to not let them get into the array if they don't meet your criteria.
<?php

$input = array(12, 14, 80, 89, 85, 84, 81, 80, 78, 84);

$output = array_slice($input, 2);      // returns 80, 89, 85, 84, 81, 80, 78, 84

?>

Is this you want to do with this? ..,
Avatar of Adam Setzler

ASKER

It's a rating system.  Some raters will be tempted to over/under rate excessively, and I would like to prevent those values from being incorporated, while still retaining them.

The function should be able to figure out which values are commonplace, and trash the others.  Does that make sense?
you mean, want to find the least numbers out of the rate numbers? .., normally out of 1 - 5 ... if user rates 2, 3 or 1 ... we could identify and eliminate?

?
Users are rating items from 1 to 100.  These items have a generally accepted value, which is determined by the majority of the ratings, but sometimes users rate items far from the generally accepted value.  I need a way to detect those "outliers".
ok., so it could be any part of series of numbers in the range 1 - 100 ...

but sometimes users rate items far from the generally accepted value
>> whatz generally accepted value?


do you say something like this,

for example, 34, 39, 45,56,67,89,90,93,99 ... so the rate may come like this ... correct??? here you want to take 34, 39..?

?
The generally accepted value should be derived from that set of numbers... Maybe something like everything outside a standard dev of the mean?  Not that exactly, unless it's the only way, though... I was just thinking there was an elegant way of doing this.
The generally accepted value should be derived from that set of numbers... Maybe something like everything outside a standard dev of the mean?

>> may be you can find the least numbers against within the range  or some min range of number to eliminate ...
Has anyone a function to handle this?  I can't find anything on PHP.net or elsewhere.
unfortunately not, this is how stats get screwed in the first place. By people taking data, removing information that they feel is not relevant, then calculating the data. There isn't an algorithm out there that can predict what values you might deem worthless.

Even if you sorted it, and had something like this

1
2
3
8
8
8
9
10
7
8

in your own mind how would you determine that 1,2 and 3 are worthless values. By visually looking at them.

Computers can't do that, they can only do as they are told.
Now lets say you tell it to remove all values that occur less then 3 times in a set, in this example it will then leave you with
8

it would have removed the 1,2,3,9,10 and 7.
when the actual average was 6.4

how look at this example

1
5
5
5
5
5
6
7
8
9
9
9
9
10
10
10
10
now here is an example tha has an average of 7.24

if you tell it to remove all tha isn't the majority you will be left with
5
5
5
5
5
which then lowers your average to 5
last example

4
4
6
6
7
7
5
1
1
10
this one has an average of 5.1
if you tell it to remove the one that only occur once you will be left with
4
4
6
6
7
7
1
1
which then leaves you an average of 4.5

Do you see the trend? and how impossible it would be to determine systematically what you want.
What about using something like filtering out everything that's outside one std dev of the mean?  That's what I was thinking about doing, but I thought there might be a more elegant way.

If there isn't, then how would I do that?
ASKER CERTIFIED SOLUTION
Avatar of Adam Setzler
Adam Setzler
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
i don't think so to find out "StdDev of the Mean" the exact thing in php,... unless you've clear idea converting this in another form... to do it in php
@logudotcom:
What was that? Whatever you were thinking, it seems you typed every other word (or two) !
The point I was trying to make is, the reason there isn't a simple function to do such a thing, because it's not a simple transaction, and it's totally up to the person who his viewing the data to decide what's not relevant and what is.

The kinda of code you are talking about would takes months to perfect and design, it's an extremely complex algorithm, mimicking the thoughts of a single individual into a PHP script.