[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 357
  • Last Modified:

How many different colors in sample

Dear experts,

first of all I should point out, that this is not homework. I rephrased a problem I have at work in classical analogies of probability theory.

Suppose you have an urn with balls of different colors. There are n colors and k balls of each color. You draw m (m << n) balls. Lets say with replacement, but it is not really important.

Question: What is the probability that the sample contains balls of exactly c different colors (not matter which ones).

Given:
n = number of colors
k = number of balls for each color
m = number of balls drawn
c = number of different colors in the sample

Asked for:
Probability of c given n,k and m = P(c | n,k,m)

I am really looking for a closed form,like a formula/method which takes n,k,m and c.

I am not interested in specific examples like

http://answers.yahoo.com/question/index?qid=20100422230505AALgPFn
0
derduff
Asked:
derduff
  • 7
  • 5
1 Solution
 
Dave BaldwinFixer of ProblemsCommented:
If there was a "random" distribution, you might be able to get a formula.  But with a real urn full of balls, you can't reach past the balls on top so it is not random but is dependent on the order that the urn is filled.
0
 
derduffAuthor Commented:
The urn is merely a theoretical construct. The probability to be drawn is the same for each ball in the urn: 1/(n*k)
0
 
TommySzalapskiCommented:
If it is with replacement, or n is so much greater than m that it approximates replacement, then n and k don't matter. Only the number of different colors (n/k) which I shall refer to as d.

So you have a probability of 1/d that any selected ball is of a certain color.

I'm going to post some initial observations and then work on it some more later.

If m = c = d (you want 1 of each color), then
p = 1 * (d-1)/d * (d-2)/d * ... * 2/d * 1/d= d!/d^d

If m = c < d (you want 1 per color, but not all colors will be there)
p = 1*(d-1)/d * (d-m+1)/d = d!/((m+1)! * d^m)
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
TommySzalapskiCommented:
Oops. For m=c<d it's really d!/( (d-m)! * d^m ) and if you set d=m, you get the same equation as the one for m=c=d
0
 
derduffAuthor Commented:
Hi Tommy, thank you for working on this.  Just a note: n already denominates the number of colors. The total number of ball is n*k.


0
 
TommySzalapskiCommented:
Oops. Sorry. I'm so used to n meaning the total number of balls that I read it that way. Hah!

So you also look at this as a state diagram (the number by the arrow is the probability of moving to that state from the current one).
 states
with any generic portion in this form  (where x is the number of different colors so far)
 genericAnd you want the probability of m transitions landing you on the C state.
0
 
derduffAuthor Commented:
Hmm I see how you think. I see one problem though. There is a large number of ways to achieve m transitions. For c=4 m=8 you could have

1=>1=>1=>1=>1=>2=>3=>4

1=>1=>1=>1=>2=>2=>3=>4

1=>2=>2=>2=>2=>3=>4=>4

These are basically integer partitions (the number of ways you can write an positive integer n as the sum of k other positive integers). The number of this partitions can be calculated with the partition function.

http://en.wikipedia.org/wiki/Partition_(number_theory)#Partition_function

At this point I have to admit that I already have a solution, which goes in this direction but requires the explicit enumeration of the partitions. It works, however it is insanely complicated. I didn't want to bring it up before, in order not to steer the discussion to much in that path.
0
 
TommySzalapskiCommented:
Yes, but the beauty of the state diagram is that you don't need to worry about what order the states happen in. You just multiply out the probabilities for each step and after m matrix multiplications you get the probability of being in each state. To get it into a closed form solution would take some clever algebra (which I will have time for tomorrow I hope). If you actually have numbers for m, c, and n, then it's just plug and chug into MATLAB or an Excel sheet.
0
 
derduffAuthor Commented:
No, the order is important, because the probabilities change, depending how long one remains in one state.


1=>1=>1=>1=>1=>2=>3=>4

= (1/n)^5 * (n-1)/n *(n-2)/n * (n-3)/n

 1=>1=>1=>1=>2=>2=>3=>4

= (1/n)^4 * (n-1)/n * (2/n)^1 * (n-2)/n * (n-3)/n

1=>2=>2=>2=>2=>3=>4=>4

= (1/n)^1 * (n-1)/n * (2/n)^3 *  (n-2)/n * (n-3)/n * (4/n)^1

Same path length , different results.
0
 
TommySzalapskiCommented:
No the order is not important. That's why I like doing complex probabilities this way. The probability vector has all the possible paths built into it.

For example, after you grab one ball, you must be in state 1. After you grab the second ball there is a 1/n chance you are still in state 1 and a (n-1)/n chance you are now in state 2.

On the third grab, there is a (1/n)*(1/n) chance that you are still in state 1 and a (n-1)/n*(2/n)+(1/n)*(n-1)/n chance that you are in state 2 and a (n-1)/n*(n-2)/n chance that you are in state 3.

Here's the Excel sheet that will do all the calculations. m, n, and c should be set and it will give you P. You can see all the calculation steps. If you want m and c to be able to go over 25, you just need to extend the formulas.
colors.xls
0
 
derduffAuthor Commented:
You are right, as far I see it! I didn't look at that way. Please let me check it more thoroughly today, before I finally accept it. But it is looking very good! Thanks!
0
 
derduffAuthor Commented:
I accept the solution and I thank you for removing this giant knot in my brain.

P(c,m|n) =  (n-c+1)/n * P(c-1,m-1|n) + c/n *  P(c,m-1|n)

P(1,1|n) = 1

for c <= m <= n

So recursion is key :D. I have attached a simplified version of the table. Yours is very nice and sophisticated, but in my version it easier to track the formula ;). You just enter n at the top, and you can lookup the value for m and c at the respective position.

Thanks again!

simpletablecolors.xlsx
0
 
derduffAuthor Commented:
Thank you!!
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 7
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now