This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

Hi

Why is it necessary for there to be no outiers in the data for parametric tests. I know the tests assume there are no outliers in the data so that in itself is reason to make sure there aren't any.

Why do the tests assume there aren't any outliers? Is it because outliers affect the value of the mean and parametric tests rely on the mean?

Also why do parametric tests require normally distributed data and homogeneity of variance?

many thanks

Why is it necessary for there to be no outiers in the data for parametric tests. I know the tests assume there are no outliers in the data so that in itself is reason to make sure there aren't any.

Why do the tests assume there aren't any outliers? Is it because outliers affect the value of the mean and parametric tests rely on the mean?

Also why do parametric tests require normally distributed data and homogeneity of variance?

many thanks

"Why is it necessary for there to be no outliers in the data for parametric tests."

Because the definition of parametric test (as opposed to non-parmetric) assumes the sample represents a normal population distribution so that simpler statistical methods can be used. Outliers violate this requirement.

See

http://en.wikipedia.org/wi

Most statistical analysis depends on the data having all errors normally distributed and independent.

What to do with outliers. The only correct way is to collect enough data so that outliers do not influence the result. (I know, this is always difficult and sometimes impossible.)

If you eliminate outliers from your data set, you can be in deep trouble Outliers are difficult to define. If you give me a data set and allow me to define what an outlier is, I can give you any result you want.

Point 3 in the last link above is particularly troublesome. Repeated outlier removal can cause you to end up with only one data point in which case you can show that your result is guaranteed to be 100% right.

Even in a hard science like physics it is easy to show that the elimination of outliers leads to missed discoveries.

So much for soapbox. For answers see next post.

A normal distribution does not contain outliers. A small sample from a normal distribution might contain points which are called outliers by some common definition.

WaterStreet says it reaonable well

"Why is it necessary for there to be no outliers in the data for parametric tests."

Because the definition of parametric test (as opposed to non-parmetric) assumes the sample represents a normal population distribution so that simpler statistical methods can be used.

The theory on which parametric tests rest requires that you sample be taken from a population which is normal. Outlier theory assumes that you can identify outliers and that when you remove them you will have a sample which you can say comes from a population with a normal distribution. (A big assumption, often ignored.)

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.

"...a normal distribution has most of the values clustered around the mean but it does have some extreme values, in other words it does have some outliers."

Okay, it can have some outliers without being an anomaly, but within limits, as explained below.

"In the case of normally distributed data, roughly 1 in 22 observations will differ by twice the standard deviation or more from the mean, and 1 in 370 will deviate by three times the standard deviation; ... In a sample of 1000 observations, the presence of up to five observations deviating from the mean by more than three times the standard deviation is within the range of what can be expected...and not indicative of an anomaly. If the sample size is only 100, however, just three such outliers are already reason for concern, being more than 11 times the expected number." http://en.wikipedia.org/wi

Hope this helps