Solved

Posted on 2011-10-06

Dear experts,

first of all I should point out, that this is not homework. I rephrased a problem I have at work in classical analogies of probability theory.

Suppose you have an urn with balls of different colors. There are n colors and k balls of each color. You draw m (m << n) balls. Lets say with replacement, but it is not really important.

Question: What is the probability that the sample contains balls of exactly c different colors (not matter which ones).

Given:

n = number of colors

k = number of balls for each color

m = number of balls drawn

c = number of different colors in the sample

Asked for:

Probability of c given n,k and m = P(c | n,k,m)

I am really looking for a closed form,like a formula/method which takes n,k,m and c.

I am not interested in specific examples like

http://answers.yahoo.com/question/index?qid=20100422230505AALgPFn

first of all I should point out, that this is not homework. I rephrased a problem I have at work in classical analogies of probability theory.

Suppose you have an urn with balls of different colors. There are n colors and k balls of each color. You draw m (m << n) balls. Lets say with replacement, but it is not really important.

Question: What is the probability that the sample contains balls of exactly c different colors (not matter which ones).

Given:

n = number of colors

k = number of balls for each color

m = number of balls drawn

c = number of different colors in the sample

Asked for:

Probability of c given n,k and m = P(c | n,k,m)

I am really looking for a closed form,like a formula/method which takes n,k,m and c.

I am not interested in specific examples like

http://answers.yahoo.com/q

13 Comments

So you have a probability of 1/d that any selected ball is of a certain color.

I'm going to post some initial observations and then work on it some more later.

If m = c = d (you want 1 of each color), then

p = 1 * (d-1)/d * (d-2)/d * ... * 2/d * 1/d= d!/d^d

If m = c < d (you want 1 per color, but not all colors will be there)

p = 1*(d-1)/d * (d-m+1)/d = d!/((m+1)! * d^m)

So you also look at this as a state diagram (the number by the arrow is the probability of moving to that state from the current one).

with any generic portion in this form (where x is the number of different colors so far)

And you want the probability of m transitions landing you on the C state.

1=>1=>1=>1=>1=>2=>3=>4

1=>1=>1=>1=>2=>2=>3=>4

1=>2=>2=>2=>2=>3=>4=>4

These are basically integer partitions (the number of ways you can write an positive integer n as the sum of k other positive integers). The number of this partitions can be calculated with the partition function.

http://en.wikipedia.org/wi

At this point I have to admit that I already have a solution, which goes in this direction but requires the explicit enumeration of the partitions. It works, however it is insanely complicated. I didn't want to bring it up before, in order not to steer the discussion to much in that path.

1=>1=>1=>1=>1=>2=>3=>4

= (1/n)^5 * (n-1)/n *(n-2)/n * (n-3)/n

1=>1=>1=>1=>2=>2=>3=>4

= (1/n)^4 * (n-1)/n * (2/n)^1 * (n-2)/n * (n-3)/n

1=>2=>2=>2=>2=>3=>4=>4

= (1/n)^1 * (n-1)/n * (2/n)^3 * (n-2)/n * (n-3)/n * (4/n)^1

Same path length , different results.

For example, after you grab one ball, you must be in state 1. After you grab the second ball there is a 1/n chance you are still in state 1 and a (n-1)/n chance you are now in state 2.

On the third grab, there is a (1/n)*(1/n) chance that you are still in state 1 and a (n-1)/n*(2/n)+(1/n)*(n-1)/

Here's the Excel sheet that will do all the calculations. m, n, and c should be set and it will give you P. You can see all the calculation steps. If you want m and c to be able to go over 25, you just need to extend the formulas.

colors.xls

P(c,m|n) = (n-c+1)/n * P(c-1,m-1|n) + c/n * P(c,m-1|n)

P(1,1|n) = 1

for c <= m <= n

So recursion is key :D. I have attached a simplified version of the table. Yours is very nice and sophisticated, but in my version it easier to track the formula ;). You just enter n at the top, and you can lookup the value for m and c at the respective position.

Thanks again!

simpletablecolors.xlsx

Join the community of 500,000 technology professionals and ask your questions.

Connect with top rated Experts

**15** Experts available now in Live!