Simple function for detecting outliers in an array of numbers?

This is a PHP statistics question, given that PHP doesn't have a good native set of math classes.

Say I have a set of numbers from 1 to 100, and this array could contain upwards 100's of values, how would I best go about filtering out outliers?

Example: 12, 14, 80, 89, 85, 84, 81, 80, 78, 84

How would I eliminate 12 and 14?
LVL 1
adamsetzlerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

hieloCommented:
How are you getting the data into the array? The key is to not let them get into the array if they don't meet your criteria.
0
Loganathan NatarajanLAMP DeveloperCommented:
<?php

$input = array(12, 14, 80, 89, 85, 84, 81, 80, 78, 84);

$output = array_slice($input, 2);      // returns 80, 89, 85, 84, 81, 80, 78, 84

?>

Is this you want to do with this? ..,
0
adamsetzlerAuthor Commented:
It's a rating system.  Some raters will be tempted to over/under rate excessively, and I would like to prevent those values from being incorporated, while still retaining them.

The function should be able to figure out which values are commonplace, and trash the others.  Does that make sense?
0
Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

Loganathan NatarajanLAMP DeveloperCommented:
you mean, want to find the least numbers out of the rate numbers? .., normally out of 1 - 5 ... if user rates 2, 3 or 1 ... we could identify and eliminate?

?
0
adamsetzlerAuthor Commented:
Users are rating items from 1 to 100.  These items have a generally accepted value, which is determined by the majority of the ratings, but sometimes users rate items far from the generally accepted value.  I need a way to detect those "outliers".
0
Loganathan NatarajanLAMP DeveloperCommented:
ok., so it could be any part of series of numbers in the range 1 - 100 ...

but sometimes users rate items far from the generally accepted value
>> whatz generally accepted value?


do you say something like this,

for example, 34, 39, 45,56,67,89,90,93,99 ... so the rate may come like this ... correct??? here you want to take 34, 39..?

?
0
adamsetzlerAuthor Commented:
The generally accepted value should be derived from that set of numbers... Maybe something like everything outside a standard dev of the mean?  Not that exactly, unless it's the only way, though... I was just thinking there was an elegant way of doing this.
0
Loganathan NatarajanLAMP DeveloperCommented:
The generally accepted value should be derived from that set of numbers... Maybe something like everything outside a standard dev of the mean?

>> may be you can find the least numbers against within the range  or some min range of number to eliminate ...
0
adamsetzlerAuthor Commented:
Has anyone a function to handle this?  I can't find anything on PHP.net or elsewhere.
0
nplibCommented:
unfortunately not, this is how stats get screwed in the first place. By people taking data, removing information that they feel is not relevant, then calculating the data. There isn't an algorithm out there that can predict what values you might deem worthless.

Even if you sorted it, and had something like this

1
2
3
8
8
8
9
10
7
8

in your own mind how would you determine that 1,2 and 3 are worthless values. By visually looking at them.

Computers can't do that, they can only do as they are told.
Now lets say you tell it to remove all values that occur less then 3 times in a set, in this example it will then leave you with
8

it would have removed the 1,2,3,9,10 and 7.
when the actual average was 6.4

how look at this example

1
5
5
5
5
5
6
7
8
9
9
9
9
10
10
10
10
now here is an example tha has an average of 7.24

if you tell it to remove all tha isn't the majority you will be left with
5
5
5
5
5
which then lowers your average to 5
last example

4
4
6
6
7
7
5
1
1
10
this one has an average of 5.1
if you tell it to remove the one that only occur once you will be left with
4
4
6
6
7
7
1
1
which then leaves you an average of 4.5

Do you see the trend? and how impossible it would be to determine systematically what you want.
0
adamsetzlerAuthor Commented:
What about using something like filtering out everything that's outside one std dev of the mean?  That's what I was thinking about doing, but I thought there might be a more elegant way.

If there isn't, then how would I do that?
0
adamsetzlerAuthor Commented:
I'll leave this question open for a bit longer, but if I don't see anything that's suitable, I'll just go with a StdDev of the Mean outlier filter. :/
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Loganathan NatarajanLAMP DeveloperCommented:
i don't think so to find out "StdDev of the Mean" the exact thing in php,... unless you've clear idea converting this in another form... to do it in php
0
hieloCommented:
@logudotcom:
What was that? Whatever you were thinking, it seems you typed every other word (or two) !
0
nplibCommented:
The point I was trying to make is, the reason there isn't a simple function to do such a thing, because it's not a simple transaction, and it's totally up to the person who his viewing the data to decide what's not relevant and what is.

The kinda of code you are talking about would takes months to perfect and design, it's an extremely complex algorithm, mimicking the thoughts of a single individual into a PHP script.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.