Monte-Carlo simulation auf ANOVA

Dear experts,

I am writing a computer program (C++) in order to do Monte-Carlo simulations of the variance analysis (ANOVA). Let us assume I have an experiment with one variable at 3 levels. I want to simulate the case that the ANOVA "gets significant" although the effect does not exist in the population.

Thus, I set all 3 mean values and standard deviations to be equal. Then I generate 1000 data sets containing random numbers taken from a population with the above mean and standard deviation.

Now I compute an ANOVA with each data set and count the cases in which the ANOVA "reaches significance" by chance, that is, alpha <= 0.05.

I presumed that 5% of the data sets should reach significance by chance. However I found out that this is not the case. Especially with N > 20 there were far less data sets that reached significance (oftenly not one of 10,000).

Did I do an error in reasoning? I think alpha is defined as the probability that a statistical test reaches significance although there is no effect in the population, that is, all values are taken from the same population.

Any help is very welcome.

Sincerely,

Albert
LVL 1
Albert-GeorgAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

TommySzalapskiCommented:
If you do a test and see a p-value of .05 that means that there is a 5% probability your data is just random and there is no correlation anywhere.

It does not mean that any random data will have a 5% chance of looking significant.
0
TommySzalapskiCommented:
Run the ANOVA on one of your samples with N > 20, notice the p-value should be well below .05.
The higher N gets, the harder it is for random noise to look significant so the p-value of the bigger samples will almost always be lower.
0
Albert-GeorgAuthor Commented:
Dear Tommy,

thank you for your answer.

If you do a test and see a p-value of .05 that means that there is a 5% probability your data is just random and there is no correlation anywhere.

What you describe is Bayesian statistics (p(H0|E)).

In "normal" statistics the contrary is the case: If you see a p-value of .05 this means that given your data are random the probability is 5% to obtain the observed effect or a larger one (p(E|H0)).

This is what confuses me ...

Sincerely,

Albert
0
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

TommySzalapskiCommented:
Right, but the p-value is based on the data. If N is greater, the p-value will be less.

"given your data are random" does not mean given any random data. It means given the dataset you provided. You won't have the same p-value for all your data sets. Each data set will produce its own p-value.

The value that you set for alpha is just the cutoff point to use on the p-values. So if you set alpha=.05 it means that it will assume anything with a p-value above that is significant.
0
Albert-GeorgAuthor Commented:
I agree, not any random data. Prerequesite is, that the random data are taken from a population in which no difference of the mean values exists and whose variance (standard deviation) is equal to that of the data set given.

If this prerequesites are met, then 5% of the random data sets should be significant.
0
TommySzalapskiCommented:
"There's a 5% chance that this result was obtained by random chance" is not quite the same as "There's a 5% chance that random data will obtain this result"

What result are you testing for? What is getting significant?

Check the p-value for some of your larger data sets. I'm guessing most will be much lower than 5.
0
Albert-GeorgAuthor Commented:
No, this is the Bayes problem again.
"There's a 5% chance that this result was obtained by random chance"
is Bayesian statistics.

"There's a 5% chance that random data will obtain this result"
is "normal" statistics.

I am testing the random data sets (with no mean difference and identical variances between groups) for significance.
0
Albert-GeorgAuthor Commented:
Dear experts,

I have found the reason for the problem. There was an error in my code when standardizing the random matrices to corrct means and standard deviations.

Now, which the error being corrected, the simulated alpha error probability is 0.05 - as it is expected to be.

Thank you and best regards,

Albert
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Albert-GeorgAuthor Commented:
The problem is completely solved now.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Math / Science

From novice to tech pro — start learning today.