Link to home
Start Free TrialLog in
Avatar of Ed Covney
Ed CovneyFlag for United States of America

asked on

How can I test for randomness? (or order?)

If I randomize the numbers 1..256, is there a test to test the random generator?  
Ideally, the test would return a real number between 0.0 to 1.0 where one extreme
would indicate a totally ordered set (i.e. Array[j] = j in 1..256) and the other
extreme would indicate "No order found".

I hope to avoid academic theories, just want some direction on how to develop a
"practical" function to test the array.

TIA - Ed.

Bonus: Why do I get an error if I use "i" in the above array declaration?
User generated image
Avatar of d-glitch
d-glitch
Flag of United States of America image

If you select some text, and then hit the italic I on the Post a Comment menu bar, it will italicize that text when it is posted.     The [ i ]   and [ /i ]   (without the extra spaces) are in-line formatting commands.
There is no easy way to tell if single shuffle operation is the product of a truly fair and unbiased random process.

There are well understood algorithms for shuffling:  
     http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle

That article also discusses some of the potential sources of bias in shuffling algorithms, particularly issues with the pseudo random number generator.

It is probably easier to certify the shuffling algorithm (and RNG), than to analyze the output results.

Does this question have actual cryptographic applications and consequences?
>>  There is no easy way to tell if single shuffle operation is the product of a truly fair and unbiased random process.

There is no hard way either.  Any and every possible output will show up eventually.
Avatar of Ed Covney

ASKER

Hello again d-glitch:

>> The [ i ]   and [ /i ]   (without the extra spaces) are in-line formatting commands.  
Thanks. Why couldn't EE could have noted that in the error?

>> There is no easy way to tell if single shuffle operation is the product of a truly fair and unbiased random process.
Hmm. The 'no..' scares me, 'not easy..' is acceptable tho. Philosophically (not mathematically): If the Array[17] is 136, does that alone suggest that Array[18] is per-determinable? I think not, therefore there must be a "test".

>> It is probably easier to certify the shuffling algorithm (and RNG), than to analyze the output results.
Unfortunately I'm not using a RNG.

>> Does this question have actual cryptographic applications and consequences?
Sort of.
I'm "directing" the shuffle. The array has a fixed internal order, not necessarily 1,2,..256, and for example I could hard-wire a one time excel shuffle as the starting point.  The "directed" output shuffle uses three inputs:
1) A large salt (an old Rivest inspired formulation), up to 173 ASCII characters.
2) A long-trigger, the user (me) supplies text, up to 100 characters.  (favorite poem, quote, or sequence of digits of Pi??)
3) A short or "use-trigger".

Now before you start laughing, I will tell you that it "appears" to work and looks random. The program supplies the salt, my "long trigger" happily resides encoded in the registry. Then on each use, I supply the short "Use" trigger. The program then concatenates (salt + long-trigger + use-trigger) and performs a series (hundreds actually) of MD5 hashes which in turn creates a pseudo shuffle.  

The use-trigger, once inputted, generates a 96-character password. On my short form, I can elect to use only the first 24 characters (why use less?), or any length 1 to 96.  If I choose 24 (which I do) then the first 24 of 96 is copied to the clipboard. If for example, if I use "amazon" a strong password is generated. Note that any subsequent time I type "amazon" the exact same password is generated. Just one small typo however, like "Amazon" or "amazn" produces a totally foreign PW set.

Created in this manner, I hope to convince at least myself that the password is as strong as or stronger than the MD5 hash that will represent it in Amazon's User DB. BTW, you should already be fairly familiar with possible PW composition distribution.

Although otherwise claimed here, I currently only have a test long form - nothing yet set in concrete and I store the long-trigger, not yet an encrypted long-trigger. I do have most procedures & functions complete just a matter of tying it all together and trusting what I have. There are also a few things I've not mentioned but need addressing, but those will happen at another time and place.

Thanks for not laughing too hard.

- Ed
Although MD5 has been broken, it is still a sophisticated mathematical function.  You might want to read this article to see what it takes in computing resources to crack an MD5 hash:  http://en.wikipedia.org/wiki/MD5  I think it very unlikely that you are making something stronger than MD5.
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ozo -
>> But it may be simpler if you could do statistical tests on the raw numbers generated
That's exactly what I'm hoping for and if NIST has it, I'll be forever in your debt.

Dave B. -
MD5 is collide-able, not broken. To my knowledge, a "blind" collision have never occurred. And a nicely salted MD5 hash, has never been collided either. So for my purposes, MD5 and salted MD5 work great.

Hackers don't actually steal passwords, they steal hashes (probably MD5) of passwords (at least we hope). If they sort the hashes, the records reveal that there are many duplicates which implies users are using the same password over many accounts or that many married couples exists who have the same names and were married in the same year; like "Dave&Mary96".

Hackers can't reverse hashes but they have programs that create & hash "passwords", then compare the hash to their DBs. I remember one hacker claimed he could hash and test 10,000,000 passwords per minute and that one year (its a yearly contest) he was able to link over 50% of his 'stolen' hashes to plain text passwords. That means as a community of users, we use very weak passwords. It is much easier to guess: "Dave&Mary96" than to reverse "93AA58D0D2D2E4A2E91245DC596F6998" (its MD5 hash).

What I meant by "..as strong as or stronger than the MD5 hash that will represent it" was to assemble a password that would essentially be impossible to guess in less than 2^128 tries. That part is easy - a slam-dunk.  My amazon PW (96 chars):
ZeHI61Oql3j3BBW7gi03h5rEOZdzzAFPy9kwJ80QanPGpsmNekuKp8t2Dk4LAfGgep2hmc8Yk1yF41Is2qGh62YcoBmRitsY
is way over kill and a hundred orders of magnitude more difficult to "guess".
But amazon doesn't store my PW, it stores its hash: "678479D42A3A62A9769CCB5EB5896C71" and if by accident hackers found a string that produced the same hash (a "blind collision"), well that string would be a good substitute for my password. So that's what I'm aiming for: Create a password that's easier to blindly collide than find.
Does that make sense?
NIST tests are good measures of pseudo randomness.
they are not good measures of difficulty of reversing a hash.

Whether something like ZeHI61Oql3j3BBW7gi03h5rEOZdzzAFPy9kwJ80QanPGpsmNekuKp8t2Dk4LAfGgep2hmc8Yk1yF41Is2qGh62YcoBmRitsY can be easily guessed may depend what method was used to generate that sequence, and whether there were any inputs to that method that are easy to guess.

But more bits in the PW than in the stored hash adds no further difficulty to the ease of finding a blind collision.
ozo -

>> NIST tests . . . they are not good measures of difficulty of reversing a hash.

I don't need/want to measure how easy or difficult is is to reverse a hash, that's well known (1 in 2^128). And if NIST tests are good measures of "pseudo randomness", then they must be good measures of randomness no matter how the data is generated.

>> But more bits in the PW than in the stored hash adds no further difficulty to the ease of finding a blind collision.

Yes - didn't I say that?
Many hashes which are not cryptographically strong are easier to reverse than 1 in 2^128.
The NIST tests proved to be very useful and my hash derived shuffles test on par with Excel's (2010) Rand(). That's all I need to see. Thanks ozo.