asked on

How do I determine the probability that one running mean is higher/lower than another running mean?

Let's say I have two bags, each with an infinite number of marbles.

In the "red" bag, exactly 2/3 are red, the rest are blue.
In the "blue" bag, exactly 1/3 are red, the rest are blue.

Let's say I draw 1 marble from each bag and keep a running average the number of red marbles in each.

Suppose after the first draw, I have:

Red bag: 0% red, 1 draw (got a blue)
Blue bag: 100% red, 1 draw (got a red)

After 1 marble, blue (100%) > red (0%), which is obviously wrong since we know the real distribution. However, if we didn't know the actual distribution, we couldn't say for sure we were wrong --- the probability that blue > red is not 0%.
What I am trying to do is to figure out when I can stop drawing marbles. That is, I want to keep drawing marbles until I am 99.9% certain that mean_red > mean_blue. (Or, if I get really unlucky, 99.9% certain that mean_red < mean_blue!)
Thanks.

Enabbar Ocap

From the red bag you should expect two out of three draws to be red. One third of the bag is blue, but the bag contains an infinite number of marbles. One third of infinity is still infinite so it is possible that you will never draw a red marble from the red bag.
To decide that you want a certain level of confidence, 99.9, I think you have to state a finite population.

d-glitch

Define a test as taking one ball from the Red bag and one ball from the Blue bag.

There are four possible results from the test.
Start by assuming we know the distributions in each bag.
P( R, B) = 4/9 ==> Correct
P( R, R) = 2/9 ==> No Information
P( B, B) = 2/9 ==> No Information
P( R, B) = 1/9 ==> Incorrect

Note that the difference in P(R) for the these two bags is dP = 1/3.

lf all you need to know is which bag has more Red balls, then I think you would have the correct answer after N = (1/dP) ² = 9 tests.
========================================

Look at another case, where the first bag has P(R) = 0.9 and the second bag has P(R) = 0.8.
There are four possible results and probabilities are:
P( R, B) = 0.18 ==> Correct
P( R, R) = 0.72 ==> No Information
P( B, B) = 0.02 ==> No Information
P( R, B) = 0.08 ==> Incorrect

I think 100 trials would give you the correct answer with significant confidence.

If you don't know the difference in P(R) for the two bags, you can't say how many tests you have to run to determine which bag has the larger value.

But you can say something like:
If I run N tests, I will be able to detect a difference in P(R) of 1/sqrt(N) with TBD confidence level.

This problem would be an excellent candidate for Monte Carlo methods.

ASKER CERTIFIED SOLUTION

d-glitch

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

d-glitch

Is this question resolved?

You can not tell in advance how many draws it will take to achieve 99.9 confidence that one bag has more or less red marbles.

But if you pay attention while you are drawing, you will decide when you have enough information.

I have rearranged my Excel sheet to calculate the confidence level after 1, 2, 3, ... 50 draws.

With your original probabilities of 2/3 and 1/3, you need 9 or 10 draws to determine which bag has more red balls with 90% confidence.

You would need 23 to 25 draws to determine the answer with 99% confidence.
But in the 1% of cases where you don't have the correct answer after 25 draws, you don't necessarily have the wrong answer.

Find a column headed by 0.990 and look down at the elements. Many of the 0's (which indicate a failure to get the answer right) will be the result of ties (6.06 or 3.03).
Monte-Carlo-for-ExEx-V2.xlsx

cwm9

ASKER

I found someone with a background in statistics to answer the question for me.

The correct solution is to use the 'Two-Sample t-test for Equal Means'. (In my actual use case, I really want the Unknown Variances version, 'Welch’s t-test'.)

See http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm

cwm9

ASKER

I've requested that this question be closed as follows:

Accepted answer: 0 points for cwm9's comment #a40950972

for the following reason:

Found an IRL expert to answer the question. Posted her answer here.

d-glitch

The t-test can be used for testing simple hypotheses on existing data.

But you original question concerned how much data you had to collect for a particular, very high confidence level:
I want to keep drawing marbles until I am 99.9% certain that mean_red > mean_blue.

I don't see how the t-test answers that question.

The Monte Carlo technique I described gives a specific answer in the case where you think you know the probabilities for each bag
9 or 10 draws for 90% confidence
23 to 25 draws for 99% confidence