• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2319
  • Last Modified:

assumptions for parametric test data

Hi
Why is it necessary for there to be no outiers in the data for parametric tests. I know the tests assume there are no outliers in the data so that in itself is reason to make sure there aren't any.
Why do the tests assume there aren't any outliers? Is it because outliers affect the value of the mean and parametric tests rely on the mean?

Also why do parametric tests require normally distributed data and homogeneity of variance?

many thanks
0
andieje
Asked:
andieje
  • 3
  • 2
  • 2
1 Solution
 
WaterStreetCommented:



"Why is it necessary for there to be no outliers in the data for parametric tests."

Because the definition of parametric test (as opposed to non-parmetric) assumes the sample represents a normal population distribution so that simpler statistical methods can be used.  Outliers violate this requirement.


See
http://en.wikipedia.org/wiki/Parametric_statistics
http://www.creative-wisdom.com/teaching/WBI/parametric_test.shtml
http://www.psychwiki.com/wiki/Dealing_with_Outliers


0
 
andiejeAuthor Commented:
You have said outliers voilate the requirement of a normal distribution. Does that mean that a normal distribution does not contain outliers?
0
 
aburrCommented:
Outliers are a touchy subject.
Most statistical analysis depends on the data having all errors normally distributed and independent.
What to do with outliers. The only correct way is to collect enough data so that outliers do not influence the result. (I know, this is always difficult and sometimes impossible.)
If you eliminate outliers from your data set, you can be in deep trouble Outliers are difficult to define. If you give me a data set and allow me to define what an outlier is, I can give you any result you want.
Point 3 in the last link above is particularly troublesome. Repeated outlier removal can cause you to end up with only one data point in which case you can show that your result is guaranteed to be 100% right.
Even in a hard science like physics it is easy to show that the elimination of outliers leads to missed discoveries.
So much for soapbox. For answers see next post.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
aburrCommented:
Does that mean that a normal distribution does not contain outliers?
A normal distribution does not contain outliers. A small sample from a normal distribution might contain points which are called outliers by some common definition.
WaterStreet says it reaonable well
"Why is it necessary for there to be no outliers in the data for parametric tests."

Because the definition of parametric test (as opposed to non-parmetric) assumes the sample represents a normal population distribution so that simpler statistical methods can be used.
The theory on which parametric tests rest requires that you sample be taken from a population which is normal. Outlier theory assumes that you can identify outliers and that when you remove them you will have a sample which you can say comes from a population with a normal distribution. (A big assumption, often ignored.)
0
 
andiejeAuthor Commented:
i don't think i am wording my questions very well. I understand that the assumptions of parametric tests are that the data are normally distributed and that the data does not contain outliers. My question is why do the tests make these assumptions. I think I understand why the tests assume the data is normally distributed (because that allows you to make all sorts of other assumptions) but I don't understand why there can't be any outliers in the data. Perhaps my understanding of outliers is wrong. I thought outliers were more than 3 standard deviations from the mean. This is probably what is confusing me: a normal distribution has most of the values clustered around the mean but it does have some extreme values, in other words it does have some outliers. So surely your data can be normally distributed and contain outliers?
0
 
WaterStreetCommented:

"...a normal distribution has most of the values clustered around the mean but it does have some extreme values, in other words it does have some outliers."

Okay, it can have some outliers without being an anomaly, but within limits, as explained below.

"In the case of normally distributed data, roughly 1 in 22 observations will differ by twice the standard deviation or more from the mean, and 1 in 370 will deviate by three times the standard deviation; ... In a sample of 1000 observations, the presence of up to five observations deviating from the mean by more than three times the standard deviation is within the range of what can be expected...and not indicative of an anomaly. If the sample size is only 100, however, just three such outliers are already reason for concern, being more than 11 times the expected number."  http://en.wikipedia.org/wiki/Outlier

Hope this helps
0
 
andiejeAuthor Commented:
thanks, that last answer cleared it up for me
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 3
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now