Link to home
Start Free TrialLog in
Avatar of Scott McDaniel (EE MVE )
Scott McDaniel (EE MVE )Flag for United States of America

asked on

Odds of Duplication

I have 40 participants competing in a contest. In that contest, we have a fixed number of Tables, and each Contestant's entry could land on any of those tables. For purposes of this discussion, let's say we have a total of 7 tables.

The maximum number of entries allowed on a table is 6. The minimum is 3. Entries are allocated as evenly as possible across those tables, although some will always have more than others in this example. For example:

Table 1 - 6 entries
Table 2 - 6 entries
Table 3 - 6 entries
Table 4 - 6 entries
Table 5 - 6 entries
Table 6 - 5 entries
Table 7 - 5 entries

For this discussion, the number of tables will never increase or decrease, and the number of entries will never increase or decrease. The "entry counts" on each table are irrelevant - for example, it does not matter if Table 1 has 5 entries, or if Table 2 has 5 entries, so long as the total number of entries for all tables is exactly 40.

Is there an equation I can use to determine the odds of exact duplication of entries landing on each table? For example if Teams 1, 2, 3, 4, 5 and 6 all land on Table 1, what are the odds of that recurring?

Also, is there an equation that would determine the odds of exact duplication for all tables and entries?
Avatar of d-glitch
d-glitch
Flag of United States of America image

If you scramble the people and then assign the first six to Table 1, and the next six to Table 2, ....

The number of possible arrangements is   40!/[(6!^5)*(5!^2)]  =  2.928 x 10^29
This assumes that the tables and the people are unique.  

So that for four people and two tables:  
   [A and B at Table 1, C and D at Table 2]  is not the same as  [C and D at Table 1, A and B at Table 2]

If this is not what you want, you could divide by an additional factor of 7! (the number of tables), but since the tables are not perfectly symmetric, this would be an approximation.
Avatar of Scott McDaniel (EE MVE )

ASKER

If it's not asking too much, could explain the equation to me. I understand the ! factorial stuff (i.e. 40x39x38xetc), and (i believe) the ^ exponentiation (a little, anyway), but I'm not sure about the rest.
There are 40! ways to line up 40 people.  40 choices for the first spot, 39 choices for the second spot, etc.

But that is not the answer, because there are many possible duplicates in that number.

Just look at the first six people in line after the scramble  [ A B C D E F ].  These are the people who are going to wind up at Table 1.  There are 6! ways to arrange these people without changing the makeup of the table.  So you have to reduce the 40! number by this factor.  And so on for the other tables.
What do you consider to be "exact duplication of entries landing on each table"?
"For example if Teams 1, 2, 3, 4, 5 and 6 all land on Table 1"
Does that mean that "Teams 1, 2, 3, 4, 5 and 6 all land on Table 1" is an example of "exact duplication of entries landing on each table"?
Are there other examples of "exact duplication of entries landing on each table"?
SOLUTION
Avatar of sarabande
sarabande
Flag of Luxembourg image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
All the equations given are fine, but do they apply to your problem?
I think the question could be answered more precisely if it were clarified a bit.
I am puzzled by the introduction of the word "team" at the end of the questions
What is a team? Does a team enter more than one item? How many teams are participating?
What do you consider to be "exact duplication of entries landing on each table"?
A "exact duplication" would be where the same 6 teams landed on the same table in any order - so I may have misstated this at the outset, and for that I apologize.

My "exact duplicate" would probably be better described as a "table duplicate". The goal is to determine the odds of having the same 6 entries land on the same table, regardless of the order of the entries. For example, these would be considered "table duplicates":

Table 1
----------
Team 1
Team 2
Team 3
Team 4
Team 5
Team 6

And a subsequent placement like this:

Table 1
-----------
Team 2
Team 1
Team 3
Team 4
Team 5
Team 6

Even though the order of Team 1 and Team 2 are switched, the end result is they "land" on the same table, and for our consideration, this would "duplicate" the same entries on the table (i.e. a "table duplicate").

I am puzzled by the introduction of the word "team" at the end of the questions
What is a team? Does a team enter more than one item? How many teams are participating?
My apologies. A "team" is the same as a "contestant", and the same as a "participant", and a team/contestant/participant will provide only one entry for the context of this question (so an "entry" is really the same thing as well). I should have used a consistent term throughout.

For some real-world context, this is in regard to a cooking contest, where 40 teams/contestants/participants will submit an entry for judging. The "pool" of teams/contestants/participants is static, and our goal is to determine the odds of the same 6 entries landing on the same table.

Note the "pool" may grow or decrease, depending on several factors, and the number of items presented for judging by each team could change - hence the desire to understand better how the equation would work.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The second time you get [A B C D E G ] at table 1.  Almost the same people at the table.  Is this okay?
Yes - as long as the same group of people are not the same on ANY table
The second time you get [ A B C D E F ] at table 2.  The same people at a different table.  Is this okay?
No - see above.
There are 40 choose 6 = 40!/(6!(40-6)!) ways to assign participants to table 1.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
If you have a deck of 40 cards (one per contestant) you can simply shuffle and deal them out to the seven tables.  
This will use up   5 of the 3,838,380  6-person arrangements
               and        2 of the   658,008   5-person arrangements.

If you do this 1000 times and keep track of the results, you are likely to have 6 collisions.  This is from a Birthday Problem type analysis.
Back to the original question?

Is there an equation I can use to determine the odds of exact duplication of entries landing on each table? For example if Teams 1, 2, 3, 4, 5 and 6 all land on Table 1, what are the odds of that recurring?

How many times are you going to run this contest?  If it is more than twice, do you care about duplicating tables for the entire history or just for the most recent event?

For the more interesting entire-history case, the probability of a random assignment not duplicating one or more tables from N prior contests (where N is small) is

     P = (1 - (5N/3,838,380)^5) * (1 - (2N/658,008)^2)

For the most-recent-event case, the answer is the same with N=1.

As N becomes larger (10K perhaps?), the equation may break down if there are hidden constraints, if there are less than 329,004  unique arrangements.  Running the Monte Carlo analysis from my last post, and comparing the results to the unconstrained calculations outlined in the attached Excel file would answer the question.
ExEx-40-People-7-Tables.xlsx
I have written and run a Monte Carlo simulation of the Table Assignment problem.
It can run 100M deals in 2 hours to find 317,000 out of 329,008 possible solutions.  
The intermediate results matched the spreadsheet predictions to approx 1%.
The program (45 lines of Python) is attached.

My answer from the last post is very close but not exact:
          P = (1 - (5N/3,838,380)^5) * (1 - (2N/658,008)^2)

How much precision do you need?
Table_Assign.txt
How many times are you going to run this contest?  If it is more than twice, do you care about duplicating tables for the entire history or just for the most recent event?
The contest would be run once, but the allocation of entries to tables will be run multiple times in that event. We don't care if a table is duplicated in a different contest - we only care about that specific contest.
I ran the Monte Carlo a few more times:
It took 5 hours to find 320,000 out of 329,008 possible solutions.
It tracks the predictions of the Excel file to better than 1% until >250K solutions.

I modified the program to look at the number of deals before the first duplication.  
In several runs of more than 1M deals:
The first duplication occurs on average after approx 350 deals.
The max and min values were less than 10 and greater than 1000.
The worst case minimum was 3 deals.

You should be able to randomize your tables in later rounds with the Python program (which checks for duplication), or by shuffling and dealing entry cards.

If you shuffle the cards, the odds of a duplication are very small ( less than 50 in 1M for the fifth round), but you would still have to check each round manually.
The "cards" (or entries) would be shuffled before each allocation, so it would seem that our odds would be somewhere around 50 in 1 million for duplication. That's certainly good enough for what we're after.

Thanks to everyone who helped me to better understand how these sorts of things work, and how to arrive at a calculation like this. I have a much better understanding that I did when I started this!!
Excellent comments. Always impresses me how far the Experts here go to help me understand what's going on!