We help IT Professionals succeed at work.

# assumptions for parametric test data

on
Medium Priority
2,348 Views
Hi
Why is it necessary for there to be no outiers in the data for parametric tests. I know the tests assume there are no outliers in the data so that in itself is reason to make sure there aren't any.
Why do the tests assume there aren't any outliers? Is it because outliers affect the value of the mean and parametric tests rely on the mean?

Also why do parametric tests require normally distributed data and homogeneity of variance?

many thanks
Comment
Watch Question

## View Solution Only

CERTIFIED EXPERT

Commented:

"Why is it necessary for there to be no outliers in the data for parametric tests."

Because the definition of parametric test (as opposed to non-parmetric) assumes the sample represents a normal population distribution so that simpler statistical methods can be used.  Outliers violate this requirement.

See
http://en.wikipedia.org/wiki/Parametric_statistics
http://www.creative-wisdom.com/teaching/WBI/parametric_test.shtml
http://www.psychwiki.com/wiki/Dealing_with_Outliers

Commented:
You have said outliers voilate the requirement of a normal distribution. Does that mean that a normal distribution does not contain outliers?
CERTIFIED EXPERT

Commented:
Outliers are a touchy subject.
Most statistical analysis depends on the data having all errors normally distributed and independent.
What to do with outliers. The only correct way is to collect enough data so that outliers do not influence the result. (I know, this is always difficult and sometimes impossible.)
If you eliminate outliers from your data set, you can be in deep trouble Outliers are difficult to define. If you give me a data set and allow me to define what an outlier is, I can give you any result you want.
Point 3 in the last link above is particularly troublesome. Repeated outlier removal can cause you to end up with only one data point in which case you can show that your result is guaranteed to be 100% right.
Even in a hard science like physics it is easy to show that the elimination of outliers leads to missed discoveries.
So much for soapbox. For answers see next post.
CERTIFIED EXPERT

Commented:
Does that mean that a normal distribution does not contain outliers?
A normal distribution does not contain outliers. A small sample from a normal distribution might contain points which are called outliers by some common definition.
WaterStreet says it reaonable well
"Why is it necessary for there to be no outliers in the data for parametric tests."

Because the definition of parametric test (as opposed to non-parmetric) assumes the sample represents a normal population distribution so that simpler statistical methods can be used.
The theory on which parametric tests rest requires that you sample be taken from a population which is normal. Outlier theory assumes that you can identify outliers and that when you remove them you will have a sample which you can say comes from a population with a normal distribution. (A big assumption, often ignored.)

Commented:
i don't think i am wording my questions very well. I understand that the assumptions of parametric tests are that the data are normally distributed and that the data does not contain outliers. My question is why do the tests make these assumptions. I think I understand why the tests assume the data is normally distributed (because that allows you to make all sorts of other assumptions) but I don't understand why there can't be any outliers in the data. Perhaps my understanding of outliers is wrong. I thought outliers were more than 3 standard deviations from the mean. This is probably what is confusing me: a normal distribution has most of the values clustered around the mean but it does have some extreme values, in other words it does have some outliers. So surely your data can be normally distributed and contain outliers?
CERTIFIED EXPERT
Commented:

"...a normal distribution has most of the values clustered around the mean but it does have some extreme values, in other words it does have some outliers."

Okay, it can have some outliers without being an anomaly, but within limits, as explained below.

"In the case of normally distributed data, roughly 1 in 22 observations will differ by twice the standard deviation or more from the mean, and 1 in 370 will deviate by three times the standard deviation; ... In a sample of 1000 observations, the presence of up to five observations deviating from the mean by more than three times the standard deviation is within the range of what can be expected...and not indicative of an anomaly. If the sample size is only 100, however, just three such outliers are already reason for concern, being more than 11 times the expected number."  http://en.wikipedia.org/wiki/Outlier

Hope this helps

Not the solution you were looking for? Getting a personalized solution is easy.

Commented:
thanks, that last answer cleared it up for me
##### Thanks for using Experts Exchange.

• View three pieces of content (articles, solutions, posts, and videos)
• Ask the experts questions (counted toward content limit)
• Customize your dashboard and profile