Hi richdiesal,
Thanks for your reply. I'm trying to validate the output of the NLP processing, so I need to know how many reports I need to pick and manually review to see if the output of the NLP tool is correct or not without going through all of them, but a number large enough so I can say the NLP works correctly.
Maybe my question was too long. I was trying to explain the problem.
Main Topics
Browse All Topics





by: richdiesalPosted on 2009-02-11 at 17:28:24ID: 23618447
There are a lot of questions here...
You seem to be talking about power analysis, but power analysis is unnecessary if you already have data. If you don't already have statistical significance, you need more people. If you do, you don't. If you want to know the specific number more cases you should get to have a specific power level (for example, an 80% chance to detect your effect), then that is a more appropriate quesiton - is this what you want?
Confidence level is the probability that you are willing to accept of a Type I Error occurring. So, if you are willing for there to be a 5% chance that if there was really no difference between groups and there really was one, 95% confidence would be the result. In medicine, 99% or 99.9% are more typical. Social sciences usually use 95%.
Confidence intervals are computed differently depending on the test, but generally are of the form:
computed statistic +/- (standard error * test statistic)
We refer to the population as the count of whatever our base level of comparison is, which is usually a theoretical value. If you can collect all data from the entire population, there is no need for the use of inferential statistics (i.e. statistical significance testing).