Odds of Duplication

I have 40 participants competing in a contest. In that contest, we have a fixed number of Tables, and each Contestant's entry could land on any of those tables. For purposes of this discussion, let's say we have a total of 7 tables.

The maximum number of entries allowed on a table is 6. The minimum is 3. Entries are allocated as evenly as possible across those tables, although some will always have more than others in this example. For example:

Table 1 - 6 entries
Table 2 - 6 entries
Table 3 - 6 entries
Table 4 - 6 entries
Table 5 - 6 entries
Table 6 - 5 entries
Table 7 - 5 entries

For this discussion, the number of tables will never increase or decrease, and the number of entries will never increase or decrease. The "entry counts" on each table are irrelevant - for example, it does not matter if Table 1 has 5 entries, or if Table 2 has 5 entries, so long as the total number of entries for all tables is exactly 40.

Is there an equation I can use to determine the odds of exact duplication of entries landing on each table? For example if Teams 1, 2, 3, 4, 5 and 6 all land on Table 1, what are the odds of that recurring?

Also, is there an equation that would determine the odds of exact duplication for all tables and entries?
LVL 86
Scott McDaniel (Microsoft Access MVP - EE MVE )Infotrakker SoftwareAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

d-glitchCommented:
If you scramble the people and then assign the first six to Table 1, and the next six to Table 2, ....

The number of possible arrangements is   40!/[(6!^5)*(5!^2)]  =  2.928 x 10^29
d-glitchCommented:
This assumes that the tables and the people are unique.  

So that for four people and two tables:  
   [A and B at Table 1, C and D at Table 2]  is not the same as  [C and D at Table 1, A and B at Table 2]

If this is not what you want, you could divide by an additional factor of 7! (the number of tables), but since the tables are not perfectly symmetric, this would be an approximation.
Scott McDaniel (Microsoft Access MVP - EE MVE )Infotrakker SoftwareAuthor Commented:
If it's not asking too much, could explain the equation to me. I understand the ! factorial stuff (i.e. 40x39x38xetc), and (i believe) the ^ exponentiation (a little, anyway), but I'm not sure about the rest.
Become a Certified Penetration Testing Engineer

This CPTE Certified Penetration Testing Engineer course covers everything you need to know about becoming a Certified Penetration Testing Engineer. Career Path: Professional roles include Ethical Hackers, Security Consultants, System Administrators, and Chief Security Officers.

d-glitchCommented:
There are 40! ways to line up 40 people.  40 choices for the first spot, 39 choices for the second spot, etc.

But that is not the answer, because there are many possible duplicates in that number.

Just look at the first six people in line after the scramble  [ A B C D E F ].  These are the people who are going to wind up at Table 1.  There are 6! ways to arrange these people without changing the makeup of the table.  So you have to reduce the 40! number by this factor.  And so on for the other tables.
ozoCommented:
What do you consider to be "exact duplication of entries landing on each table"?
"For example if Teams 1, 2, 3, 4, 5 and 6 all land on Table 1"
Does that mean that "Teams 1, 2, 3, 4, 5 and 6 all land on Table 1" is an example of "exact duplication of entries landing on each table"?
Are there other examples of "exact duplication of entries landing on each table"?
sarabandeCommented:
P(k, n) = n! / (k!*(n-k)!

that is the number of different possibilities for the second table to get k entries out of n entries.

there are m = 7 tables, so you have c = 6 chances for a duplicate in tables 2 to 7.  but since the total of all entries is 40 and maximum count is 6 there are maximum 2 tables which have 5 entries. so c actually is 4. (it could be 5 if the tables were not evenly filled).

if all assumptions made are true we have for given k = 6 and (1,2,3,4,5,6) in table 1

P(m, k, n) = ((n! / (k!*(n-k)!)) / c) = ((n! / (6!*(n-6)!)) / 4)

where only n is unknown (as far as i have understand from op).

Sara
aburrCommented:
All the equations given are fine, but do they apply to your problem?
I think the question could be answered more precisely if it were clarified a bit.
I am puzzled by the introduction of the word "team" at the end of the questions
What is a team? Does a team enter more than one item? How many teams are participating?
Scott McDaniel (Microsoft Access MVP - EE MVE )Infotrakker SoftwareAuthor Commented:
What do you consider to be "exact duplication of entries landing on each table"?
A "exact duplication" would be where the same 6 teams landed on the same table in any order - so I may have misstated this at the outset, and for that I apologize.

My "exact duplicate" would probably be better described as a "table duplicate". The goal is to determine the odds of having the same 6 entries land on the same table, regardless of the order of the entries. For example, these would be considered "table duplicates":

Table 1
----------
Team 1
Team 2
Team 3
Team 4
Team 5
Team 6

And a subsequent placement like this:

Table 1
-----------
Team 2
Team 1
Team 3
Team 4
Team 5
Team 6

Even though the order of Team 1 and Team 2 are switched, the end result is they "land" on the same table, and for our consideration, this would "duplicate" the same entries on the table (i.e. a "table duplicate").

I am puzzled by the introduction of the word "team" at the end of the questions
What is a team? Does a team enter more than one item? How many teams are participating?
My apologies. A "team" is the same as a "contestant", and the same as a "participant", and a team/contestant/participant will provide only one entry for the context of this question (so an "entry" is really the same thing as well). I should have used a consistent term throughout.

For some real-world context, this is in regard to a cooking contest, where 40 teams/contestants/participants will submit an entry for judging. The "pool" of teams/contestants/participants is static, and our goal is to determine the odds of the same 6 entries landing on the same table.

Note the "pool" may grow or decrease, depending on several factors, and the number of items presented for judging by each team could change - hence the desire to understand better how the equation would work.
d-glitchCommented:
Lets number the tables (1-7) and letter the contestants (A-Z the a-n).
The first time you get  [ A B C D E F ] at table 1.
======================================================
The second time you get [A B C D E G ] at table 1.  Almost the same people at the table.  Is this okay?
or
The second time you get [ A B C D E F ] at table 2.  The same people at a different table.  Is this okay?
======================================================
The number I gave earlier  [2.928 x 10^29] is certainly much too high.

The number of ways to populate one of the 5-contestant tables with 40 entrants is
      40!/(35!*5!)  =  658,008     This is probably the limiting factor.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Scott McDaniel (Microsoft Access MVP - EE MVE )Infotrakker SoftwareAuthor Commented:
The second time you get [A B C D E G ] at table 1.  Almost the same people at the table.  Is this okay?
Yes - as long as the same group of people are not the same on ANY table
The second time you get [ A B C D E F ] at table 2.  The same people at a different table.  Is this okay?
No - see above.
ozoCommented:
There are 40 choose 6 = 40!/(6!(40-6)!) ways to assign participants to table 1.
d-glitchCommented:
There are  40!/(35!*5!)  =  658,008  ways to assign 5 contestants to a table, and there are two 5-person tables.  
     658,008/2  =  329,004  possible arrangements

There are  40!/(34!*6!)  =  3,838,380 ways to assign 6 contestants to a table, and there are five 6-person tables.
     3,838,380/5  =  767,676  possible arrangements

The analysis above suggests that 329,004 as an upper limit, but there are almost certainly additional constraints.  This case is simple enough that it would be possible to do a complete computer analysis.  
=================================================

But let's look at a simpler case:  ten contestants [ A B C D E F G H I J ]  divided  3-3-4

There are   10!/(3!*7!)  = 120  ways to assign 3 contestants to a table and two 3-person tables.
     120/2  =  60  possible arrangements.

There are   10!/(4!*6!)  = 210  ways to assign 4 contestants to a table and one 4-person tables.
                     210  possible arrangements.

One of the 60 possible arrangements for the 3-person tables would be:  [ A B C ]    [ D E F ]    [ G H I J ]

Another would be:   [ A B D ]    [ C E F ]    [ G H I J ]   <== But this duplicates the 4-person table.

Of the 60 possible arrangements for the 3-person tables:  6!/(3!*3!)/2  =  10 of them use the first six contestants.   So there may be only six arrangements that satisfy all the constraints.  This case certainly simple enough to solve by hand.

There was an earlier problem like this on ExEx (something about scheduling golf matches) that was even more constrained.  I will try to find it.
d-glitchCommented:
This is the earlier question I just mentioned.
     http://www.experts-exchange.com/Other/Math_Science/Q_24609481.html

But I also found another question:
     http://www.experts-exchange.com/Other/Puzzles_Riddles/Q_22137055.html
 with some a very relevant post from ozo
     http://mathworld.wolfram.com/SocialGolferProblem.html

From the Wolfram site:
     Event organizers for bowling, golf, bridge, or tennis frequently tackle problems of this sort,
     unaware of the problem complexity. In general, it is an unsolved problem.
d-glitchCommented:
If you have a deck of 40 cards (one per contestant) you can simply shuffle and deal them out to the seven tables.  
This will use up   5 of the 3,838,380  6-person arrangements
               and        2 of the   658,008   5-person arrangements.

If you do this 1000 times and keep track of the results, you are likely to have 6 collisions.  This is from a Birthday Problem type analysis.
d-glitchCommented:
Back to the original question?

Is there an equation I can use to determine the odds of exact duplication of entries landing on each table? For example if Teams 1, 2, 3, 4, 5 and 6 all land on Table 1, what are the odds of that recurring?

How many times are you going to run this contest?  If it is more than twice, do you care about duplicating tables for the entire history or just for the most recent event?

For the more interesting entire-history case, the probability of a random assignment not duplicating one or more tables from N prior contests (where N is small) is

     P = (1 - (5N/3,838,380)^5) * (1 - (2N/658,008)^2)

For the most-recent-event case, the answer is the same with N=1.

As N becomes larger (10K perhaps?), the equation may break down if there are hidden constraints, if there are less than 329,004  unique arrangements.  Running the Monte Carlo analysis from my last post, and comparing the results to the unconstrained calculations outlined in the attached Excel file would answer the question.
ExEx-40-People-7-Tables.xlsx
d-glitchCommented:
I have written and run a Monte Carlo simulation of the Table Assignment problem.
It can run 100M deals in 2 hours to find 317,000 out of 329,008 possible solutions.  
The intermediate results matched the spreadsheet predictions to approx 1%.
The program (45 lines of Python) is attached.

My answer from the last post is very close but not exact:
          P = (1 - (5N/3,838,380)^5) * (1 - (2N/658,008)^2)

How much precision do you need?
Table_Assign.txt
Scott McDaniel (Microsoft Access MVP - EE MVE )Infotrakker SoftwareAuthor Commented:
How many times are you going to run this contest?  If it is more than twice, do you care about duplicating tables for the entire history or just for the most recent event?
The contest would be run once, but the allocation of entries to tables will be run multiple times in that event. We don't care if a table is duplicated in a different contest - we only care about that specific contest.
d-glitchCommented:
I ran the Monte Carlo a few more times:
It took 5 hours to find 320,000 out of 329,008 possible solutions.
It tracks the predictions of the Excel file to better than 1% until >250K solutions.

I modified the program to look at the number of deals before the first duplication.  
In several runs of more than 1M deals:
The first duplication occurs on average after approx 350 deals.
The max and min values were less than 10 and greater than 1000.
The worst case minimum was 3 deals.

You should be able to randomize your tables in later rounds with the Python program (which checks for duplication), or by shuffling and dealing entry cards.

If you shuffle the cards, the odds of a duplication are very small ( less than 50 in 1M for the fifth round), but you would still have to check each round manually.
Scott McDaniel (Microsoft Access MVP - EE MVE )Infotrakker SoftwareAuthor Commented:
The "cards" (or entries) would be shuffled before each allocation, so it would seem that our odds would be somewhere around 50 in 1 million for duplication. That's certainly good enough for what we're after.

Thanks to everyone who helped me to better understand how these sorts of things work, and how to arrive at a calculation like this. I have a much better understanding that I did when I started this!!
Scott McDaniel (Microsoft Access MVP - EE MVE )Infotrakker SoftwareAuthor Commented:
Excellent comments. Always impresses me how far the Experts here go to help me understand what's going on!
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Algorithms

From novice to tech pro — start learning today.