Solved

Biased/unbiased Standard Deviation

Posted on 2002-06-10
5
41,521 Views
Last Modified: 2013-11-13
In standard deviation formula we sometimes divide by (N) and sometimes (N-1)
where N = number of data points.

Somewhere I read that 'N' or 'N-1' does not make difference for large datasets.
but when we calculate std. dev. for less than 20 data points, dividing by 'N' gives
a biased estimate and 'N-1' gives unbiased estimate.

Can someone explain with example..how does subtracting 1 help?
0
Comment
Question by:prashant_n_mhatre
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 1

Expert Comment

by:acerola
ID: 7080054
standart deviation is the square root of the mean of the square of the deviation:

average = A

sample = x

deviation = x-A

square of deviation = (x-A)^2

mean of the square of the deviation = Sum((x-A)^2) / N

N = number of samples.

standart deviation = Sqrt(Sum((x-A)^2) / N)

That's all I know. And that is what I found on a web page:

"You use the N-1 if the estimate is unbiased". And the definition of bias is:

"A statistic is biased if, in the long run, it consistently over or underestimates the parameter it is estimating. More technically it is biased if its expected value is not equal to the parameter. A stop watch that is a little bit fast gives biased estimates of elapsed time. Bias in this sense is different from the notion of a biased sample. A statistic is positively biased if it tends to overestimate the parameter; a statistic is negatively biased if it tends to underestimate the parameter. An unbiased statistic is not necessarily an accurate statistic. If a statistic is sometimes much too high and sometimes much too low, it can still be unbiased. It would be very imprecise, however. A slightly biased statistic that systematically results in very small overestimates of a parameter could be quite efficient."

Hope it helps. I didnt get it very well...
0
 
LVL 1

Expert Comment

by:gd2000
ID: 7089880
Okay - too long since I've done this stuff - but I can tell you for definite that you can derive the formula for standard deviation from a method called the Maximum Likelihood Estimator. This is essentially a (quite complex) method which will give you an estimator for a statistic for your data. Because it is complex, it can be difficult to solve for some statistics, but (relatively) easy for the mean and variance. As part of the derivation it can be found that while dividing by N given an unbiased estimator for a population, it would give a biased estimator for a sample. Dividing by N - 1 will solve the problem for a sample. If you really want, I can try to dig out some links for MLE, but quite honestly the logic ain't easy! Essentially in the calculation of an MLE there is also a bias element. You can trade off bias for accuracy (if memory serves).

I'm sorry the explanation isn't a simple one - but it's the best I can do without trying to relearn my college notes on the topic (and that's not worth 1000 points!!!).
0
 
LVL 4

Author Comment

by:prashant_n_mhatre
ID: 7091078
Still it is not fully clear to me...let us keep this question open for few days !!!!
0
 
LVL 1

Accepted Solution

by:
gd2000 earned 75 total points
ID: 7092624
Try the following links. Probably unlikely to explain things in clear terms (mainly because I'm not sure a wholly accurate explanation can be put in lay terms), but at least it will give you something to chew on (if you want to delve in further!).

http://www.asp.ucar.edu/colloquium/1992/notes/part1/node21.html (actually explains the reasons)

MLE for the mean: http://www.esg.montana.edu/eguchi/Biol504Fall2000/MaxLikelihoodsummary.pdf

Introduction to MLE: http://socserv.socsci.mcmaster.ca/jfox/Courses/soc740/MLE.pdf

The final link says that the property of an MLE estimator “asymptotically unbiased – but may be biased in finite samples”. This is essentially the reason (you need to multiply by n / (n - 1) to make the variance estimator unbiased for samples, you can see this from the bias term.
0
 
LVL 4

Author Comment

by:prashant_n_mhatre
ID: 7124388
Thank you all...

The reson is explained very well in "Statistics In Plain English" book. Yep, it has to do with 'Sample' and 'population'. Generally the standard deviation calculated using sample is lower than population. To accomodate that, we divide it by N-1. Dividing by 1000 or 999 doesn't make much difference..but 10 or 9 numbers do...

I'd a look at Maximum Likelihood...I did study it long back....The links are very useful.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction A frequently used term in Object-Oriented design is "SOLID" which is a mnemonic acronym that covers five principles of OO design.  These principles do not stand alone; there is interplay among them.  And they are not laws, merely princ…
Lithium-ion batteries area cornerstone of today's portable electronic devices, and even though they are relied upon heavily, their chemistry and origin are not of common knowledge. This article is about a device on which every smartphone, laptop, an…
This is a video describing the growing solar energy use in Utah. This is a topic that greatly interests me and so I decided to produce a video about it.
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question