Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

Dear experts,

I am writing a computer program (C++) in order to do Monte-Carlo simulations of the variance analysis (ANOVA). Let us assume I have an experiment with one variable at 3 levels. I want to simulate the case that the ANOVA "gets significant" although the effect does not exist in the population.

Thus, I set all 3 mean values and standard deviations to be equal. Then I generate 1000 data sets containing random numbers taken from a population with the above mean and standard deviation.

Now I compute an ANOVA with each data set and count the cases in which the ANOVA "reaches significance" by chance, that is, alpha <= 0.05.

I presumed that 5% of the data sets should reach significance by chance. However I found out that this is not the case. Especially with N > 20 there were far less data sets that reached significance (oftenly not one of 10,000).

Did I do an error in reasoning? I think alpha is defined as the probability that a statistical test reaches significance although there is no effect in the population, that is, all values are taken from the same population.

Any help is very welcome.

Sincerely,

Albert

I am writing a computer program (C++) in order to do Monte-Carlo simulations of the variance analysis (ANOVA). Let us assume I have an experiment with one variable at 3 levels. I want to simulate the case that the ANOVA "gets significant" although the effect does not exist in the population.

Thus, I set all 3 mean values and standard deviations to be equal. Then I generate 1000 data sets containing random numbers taken from a population with the above mean and standard deviation.

Now I compute an ANOVA with each data set and count the cases in which the ANOVA "reaches significance" by chance, that is, alpha <= 0.05.

I presumed that 5% of the data sets should reach significance by chance. However I found out that this is not the case. Especially with N > 20 there were far less data sets that reached significance (oftenly not one of 10,000).

Did I do an error in reasoning? I think alpha is defined as the probability that a statistical test reaches significance although there is no effect in the population, that is, all values are taken from the same population.

Any help is very welcome.

Sincerely,

Albert

Experts Exchange Solution brought to you by

Enjoy your complimentary solution view.

Get every solution instantly with Premium.
Start your 7-day free trial.

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

The higher N gets, the harder it is for random noise to look significant so the p-value of the bigger samples will almost always be lower.

thank you for your answer.

If you do a test and see a p-value of .05 that means that there is a 5% probability your data is just random and there is no correlation anywhere.

What you describe is Bayesian statistics (p(H0|E)).

In "normal" statistics the contrary is the case: If you see a p-value of .05 this means that

This is what confuses me ...

Sincerely,

Albert

"given your data are random" does not mean given any random data. It means given the dataset you provided. You won't have the same p-value for all your data sets. Each data set will produce its own p-value.

The value that you set for alpha is just the cutoff point to use on the p-values. So if you set alpha=.05 it means that it will assume anything with a p-value above that is significant.

If this prerequesites are met, then 5% of the random data sets should be significant.

What result are you testing for? What is getting significant?

Check the p-value for some of your larger data sets. I'm guessing most will be much lower than 5.

"There's a 5% chance that this result was obtained by random chance"is Bayesian statistics.

"There's a 5% chance that random data will obtain this result"is "normal" statistics.

I am testing the random data sets (with no mean difference and identical variances between groups) for significance.

I have found the reason for the problem. There was an error in my code when standardizing the random matrices to corrct means and standard deviations.

Now, which the error being corrected, the simulated alpha error probability is 0.05 - as it is expected to be.

Thank you and best regards,

Albert

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Math / Science

From novice to tech pro — start learning today.

Experts Exchange Solution brought to you by

Enjoy your complimentary solution view.

Get every solution instantly with Premium.
Start your 7-day free trial.

It does not mean that any random data will have a 5% chance of looking significant.