How many different colors in sample

Dear experts,

first of all I should point out, that this is not homework. I rephrased a problem I have at work in classical analogies of probability theory.

Suppose you have an urn with balls of different colors. There are n colors and k balls of each color. You draw m (m << n) balls. Lets say with replacement, but it is not really important.

Question: What is the probability that the sample contains balls of exactly c different colors (not matter which ones).

Given:
n = number of colors
k = number of balls for each color
m = number of balls drawn
c = number of different colors in the sample

Asked for:
Probability of c given n,k and m = P(c | n,k,m)

I am really looking for a closed form,like a formula/method which takes n,k,m and c.

I am not interested in specific examples like

http://answers.yahoo.com/question/index?qid=20100422230505AALgPFn
LVL 2
derduffAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Dave BaldwinFixer of ProblemsCommented:
If there was a "random" distribution, you might be able to get a formula.  But with a real urn full of balls, you can't reach past the balls on top so it is not random but is dependent on the order that the urn is filled.
0
derduffAuthor Commented:
The urn is merely a theoretical construct. The probability to be drawn is the same for each ball in the urn: 1/(n*k)
0
TommySzalapskiCommented:
If it is with replacement, or n is so much greater than m that it approximates replacement, then n and k don't matter. Only the number of different colors (n/k) which I shall refer to as d.

So you have a probability of 1/d that any selected ball is of a certain color.

I'm going to post some initial observations and then work on it some more later.

If m = c = d (you want 1 of each color), then
p = 1 * (d-1)/d * (d-2)/d * ... * 2/d * 1/d= d!/d^d

If m = c < d (you want 1 per color, but not all colors will be there)
p = 1*(d-1)/d * (d-m+1)/d = d!/((m+1)! * d^m)
0
Cloud Class® Course: Ruby Fundamentals

This course will introduce you to Ruby, as well as teach you about classes, methods, variables, data structures, loops, enumerable methods, and finishing touches.

TommySzalapskiCommented:
Oops. For m=c<d it's really d!/( (d-m)! * d^m ) and if you set d=m, you get the same equation as the one for m=c=d
0
derduffAuthor Commented:
Hi Tommy, thank you for working on this.  Just a note: n already denominates the number of colors. The total number of ball is n*k.


0
TommySzalapskiCommented:
Oops. Sorry. I'm so used to n meaning the total number of balls that I read it that way. Hah!

So you also look at this as a state diagram (the number by the arrow is the probability of moving to that state from the current one).
 states
with any generic portion in this form  (where x is the number of different colors so far)
 genericAnd you want the probability of m transitions landing you on the C state.
0
derduffAuthor Commented:
Hmm I see how you think. I see one problem though. There is a large number of ways to achieve m transitions. For c=4 m=8 you could have

1=>1=>1=>1=>1=>2=>3=>4

1=>1=>1=>1=>2=>2=>3=>4

1=>2=>2=>2=>2=>3=>4=>4

These are basically integer partitions (the number of ways you can write an positive integer n as the sum of k other positive integers). The number of this partitions can be calculated with the partition function.

http://en.wikipedia.org/wiki/Partition_(number_theory)#Partition_function

At this point I have to admit that I already have a solution, which goes in this direction but requires the explicit enumeration of the partitions. It works, however it is insanely complicated. I didn't want to bring it up before, in order not to steer the discussion to much in that path.
0
TommySzalapskiCommented:
Yes, but the beauty of the state diagram is that you don't need to worry about what order the states happen in. You just multiply out the probabilities for each step and after m matrix multiplications you get the probability of being in each state. To get it into a closed form solution would take some clever algebra (which I will have time for tomorrow I hope). If you actually have numbers for m, c, and n, then it's just plug and chug into MATLAB or an Excel sheet.
0
derduffAuthor Commented:
No, the order is important, because the probabilities change, depending how long one remains in one state.


1=>1=>1=>1=>1=>2=>3=>4

= (1/n)^5 * (n-1)/n *(n-2)/n * (n-3)/n

 1=>1=>1=>1=>2=>2=>3=>4

= (1/n)^4 * (n-1)/n * (2/n)^1 * (n-2)/n * (n-3)/n

1=>2=>2=>2=>2=>3=>4=>4

= (1/n)^1 * (n-1)/n * (2/n)^3 *  (n-2)/n * (n-3)/n * (4/n)^1

Same path length , different results.
0
TommySzalapskiCommented:
No the order is not important. That's why I like doing complex probabilities this way. The probability vector has all the possible paths built into it.

For example, after you grab one ball, you must be in state 1. After you grab the second ball there is a 1/n chance you are still in state 1 and a (n-1)/n chance you are now in state 2.

On the third grab, there is a (1/n)*(1/n) chance that you are still in state 1 and a (n-1)/n*(2/n)+(1/n)*(n-1)/n chance that you are in state 2 and a (n-1)/n*(n-2)/n chance that you are in state 3.

Here's the Excel sheet that will do all the calculations. m, n, and c should be set and it will give you P. You can see all the calculation steps. If you want m and c to be able to go over 25, you just need to extend the formulas.
colors.xls
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
derduffAuthor Commented:
You are right, as far I see it! I didn't look at that way. Please let me check it more thoroughly today, before I finally accept it. But it is looking very good! Thanks!
0
derduffAuthor Commented:
I accept the solution and I thank you for removing this giant knot in my brain.

P(c,m|n) =  (n-c+1)/n * P(c-1,m-1|n) + c/n *  P(c,m-1|n)

P(1,1|n) = 1

for c <= m <= n

So recursion is key :D. I have attached a simplified version of the table. Yours is very nice and sophisticated, but in my version it easier to track the formula ;). You just enter n at the top, and you can lookup the value for m and c at the respective position.

Thanks again!

simpletablecolors.xlsx
0
derduffAuthor Commented:
Thank you!!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Math / Science

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.