This "Probability" problem has stumped me for several weeks:

If I randomly select a 20 character password from a set of 256 characters composed of:

4 sets of lower case letters (104),

3 sets of Upper case letters (78),

7 of each number in 0..5 (42) and

8 of each number in 6..9 (32)

total (256)

What are the chances a second random 20 character selection will match the first?

Notes:

P(M,N) means the probability of M items taken N at a time, order counts.

C(M,N) means the # of combinations of M items taken N at a time (order is n/a)

Many thanks . . .

- Ed

If I randomly select a 20 character password from a set of 256 characters composed of:

4 sets of lower case letters (104),

3 sets of Upper case letters (78),

7 of each number in 0..5 (42) and

8 of each number in 6..9 (32)

total (256)

What are the chances a second random 20 character selection will match the first?

Notes:

P(M,N) means the probability of M items taken N at a time, order counts.

C(M,N) means the # of combinations of M items taken N at a time (order is n/a)

Many thanks . . .

- Ed

My first estimate is odds of 1 in

```
256!
--------------------------------------------
(4!)^26 * (3!)^26 * (7!)^6 * (8!)^4 * (236!)
```

i.e. 1 in 1.461501637330902918203684

or approximately 1 in 1,461,501,637,330,900,000,

4 sets of lower case letters 26^4,

3 sets of Upper case letters 26^3,

7 of each number in 0..5 6^7

8 of each number in 6..9 4^8

So multiply all those together to get the total number of possible sets of characters.

Then you can interleave them many ways.

Total possible ordering of the characters is 20! but you can divide that by 4!3!7!8!.

So 20! *26^4*26^7*6^7*4^8

--------------------------

4!*3!*7!*8!

Note that you could simplify the two 26^ terms to 26^11, but it's easier to see what went where above.

256!/236! is the number of passwords for 256 unique characters.

Next you have to calculate the probable number of repeated characters and the probable number of repetitions for each.

Commuting time has arrived. Hope to do more before ozo makes me irrelevant.

Probability-for-ExEx.PNG

4 sets of lower case letters (104),

3 sets of Upper case letters (78),

7 of each number in 0..5 (42)

8 of each number in 6..9 (32)

So your character set to choose from is aaaaAAAbbbbBBB etc.

So there are C(256,20) different collections of characters you could use but some are repeats since 'g' is the same as 'g' regardless of order. So d-glitch's first suggestion with a 20! term added to the denominator is the total possible sets of characters you could use.

Making order matter is tricky though for the 20 since you don't know how many repeats there are.

This one is on us!

(Get your first solution completely free - no credit card required)

UNLOCK SOLUTION
It can be solved like that, but it would take a long time to do on paper.

I don't know if there is a closed form solution.

I don't like the 61^15 factor very much either.

I have since realized one way to approach the problem is with Monte Carlo methods.

Generate random strings using the Case 3 (unlimited repetition) alphabet and keep track of the stats on the violations.

This would work with your original (or any arbitrary) alphabet restrictions.

Python can run 1 or 10 or 100 million cases pretty quickly.

This will give a lot better precision that my 0.1% estimate of invalid strings.

This continues to be an interesting problem.

I will keep you posted.

I have revised this post. My first attempt was with 19-character strings instead of 20. The differences are small but noticeable. Embarrassing.

Looking at the last line in the screen shot:

31383 is the number of strings with not even one pair

743661 have one or more pairs but nothing better

208137 have one or more sets of three but nothing better

15665 have one or more sets of four but nothing better [Some of these will be invalid.]

and so on . . . .

==========================

Back to your original question:

If you use a 62 symbol alphabet with unlimited repetition, there are

(62)^20 = 7.04 x 10^35 possible strings.

If you allow only 3 copies of UC letters, and 4 copies of LC letters, the number of invalid strings in 1M random samples will be:

```
(78/256)*15665 = 4773 <= Sets of 4
((78+104)/256)*(832 + 20 + 1) = (182/256)*853 = 606 <= Sets of 5, 6, and 7
-----------------------
TOTAL = 5379
99.462% of the strings are valid.
```

The correction due to the limits on the number of copies of the integers is less than 1 ppm.

I will post my Python code if any one is interested.

And I did insert one non-random string with 20 of the same character to test the distribution calculations.

The RandString program is as vanilla as you can get, and should run fine on any installation.

The current Python random number generator is based on the Mersenne Twister.

I don't even know what that means, but I believe it is world-class and state-of-the-art.

I have cleaned up the program, taken out the tedious poker hand analysis, and added some comments.

It runs 1M strings in 138 seconds.

RandStrings.py

I still trust my Python program code, but my last manual calculations were off.

I should have been considering fractions of the 62 char alphabet, not the 256.

```
(26/62)*15665 = 6569 <= Sets of 4
((78+104)/256)*(832 + 20 + 1) = (52/62)*853 = 715 <= Sets of 5, 6, and 7
----------------------
TOTAL = 7284
99.271% of the strings are valid.
```

That is recalculating from the old data. I also changed the program to validate strings directly. The results agree to 4 decimal places.RandString-Validation.py

Python lists start at 0, but I wanted the index to refer to set size.

I wanted index values from 1 to 20, but I had to include the zero as well.

pattern[ 0] = 0 means nothing

pattern[ 1] = 14 means 14 unmatched characters or singletons

pattern[ 2] = 1 means 1 pair

pattern[ 3] = 0 means no sets of 3

pattern[ 4] = 1 means 1 set of 4 ==> 14*1 + 1*2 + 0*3 + 1*4 = 20 So the rest of the terms are all 0's

dist[ 0] = 0 means nothing

dist[ 1] = 31683 means that many strings have 20 singletons, not even 1 pair.

dist[ 2] = 743661 means that many strings have one or more pairs, but no sets of 3 or greater.

dist[20] = 1 means that one string had a set of 20, but I forced that as a test. It is noted in the code.

If you add up ( 31683 + 743661 + 208137 + . . . you will get the number of strings.

I actually ran 100M strings, and got exactly one set of 8 (none larger), and no occurrences of zero singletons.

If you add up the distribution numbers

Values [ 47, 13, 13, 31, 5, 26, . . . ] are the twenty random numbers in the order they were generated.

Sorted [ 5, 6, 1, 13, 13, 13, 13, . . . ] are the same numbers sorted. You can check. You have to sort to do the analysis.

Pattern [ 0, 14, 1, 0, 1, 0, 0, . . . ] describes the sorted list:

14 singletons one pair ( 31, 31) no sets of 3 one set of 4 ( 13, 13, 13, 13)

1000247 is the number of strings generated and analyzed so far.

992999 is the number that are valid.

99.2753 is the percentage of strings that are valid.

I wanted the program print out intermediate results every 100K strings, but it set it up so that it would print out the next bad string after 100K, 200K . . . In this case, it overran the 1M mark by 247.

We are using random numbers to simulate a complex but well defined problem.

We are using an infinite depth alphabet to model a finite one.

The result: that 99.275% of the strings in a 1M random sample could come from the limited alphabet is persuasive.

I believe it is the correct answer to four or five significant figures.

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy

Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif

Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster

CTP, Sr Infrastructure Consultant

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Connect with Certified Experts to gain insight and support on specific technology challenges including:

- Troubleshooting
- Research
- Professional Opinions

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.