While training and mentoring a new user with a Microsoft Access application I'd written for the client, a quite unexpected condition occurred in my application. This would normally be the kind of condition that prompts me to find an error I'd introduced and not trapped. However, this time the error elicited exclamations of wonder and puzzlement. Surely this could not happen... but I was a witness and couldn't deny my own eyes.
Accepting the possibility that it could happen and creating a do-over (with different key) patch was my highest priority. That patch allowed the application to continue with that day's production run. Disaster had been averted and I left the client's office shaking my head in disbelief. Time to put on my Computer Scientist cap. This is the story of my research and a warning or alert to all VB developers who use the PRNG features of the VB language.
PRNG is not a new Visual Basic statement, feature, referenced class, nor a new kind of potato chip. It is an acronym for Pseudo-Random Number Generator. I'll get to the details about PRNGs and VB PRNG features in a moment. Firstly, I should explain why a Microsoft Access business application needs to use PRNG features.
Several years ago, we added a feature to this executive survey application that allows the survey taker to answer the survey questions online through their Internet browser. Originally, the surveys were all paper answers that were typed into the database application in typical data-entry fashion. After some setup data entry and Excel data import, the application operator causes survey questions to be sent to a (LAMP) server and an email sent to the survey taker that includes a URL with a key to their survey. After the survey-answering period has expired, we download all completed and partially completed surveys to the Microsoft Access database for extensive profile reporting.
Application Figure 1 (flow)
a. Add row to web survey table
b. Use newly inserted row's primary key value to seed VB PRNG
c. Iterate Rnd() to pick out 30 characters from string
d. Update newly inserted row with random string
e. Secure copy MySQL table data to LAMP server
f. Invoke data import PHP script
Application Source Figure 1 (new record)
This is the area where the application failed due to VB's PRNG weakness. The failing statement was the second .Update method when I assigned an rs!Email_ID value. The Email_ID column has a unique index. Am I ever glad I forced uniqueness on that column, since problem analysis is more difficult on the MySQL database.
Application Source Figure 2 (generate random string)
Using the parameter (=rowID) as the seed for a new PRNG sequence, build a string of random characters.
Public Function EmailID_For_SID(parmS_ID) As String Dim strTemp As String Dim intLoop As Integer Dim intCharPosn As Integer Dim strCharBase As String strCharBase = "01234ABCDEFGIJKLMNOPQRSTUVWXYZ" & _ "abcdefghijklmnopqrstuvwxyz56789" strTemp = String(30, "A") Rnd -1 'resets the random seed Randomize (parmS_ID) 'initialize the seed For intLoop = 1 To Len(strTemp) Mid$(strTemp, intLoop, 1) = _ Mid$(strCharBase, Rnd() * Len(strCharBase) + 1, 1) Next EmailID_For_SID = strTempEnd Function
For security reasons, we didn't want to identify the surveys with a long integer value, since a mistyped number would result in the ability to see someone else's survey answers. This is an especially sensitive issue since these executives were often rating their co-workers and senior executives. Since this data has real value to the clients and to the industry, allowing competitors to see or change this data would be very bad for my client. We wrote the PHP application to show an application based on a thirty (30) length string identifier comprised of upper and lower case letters and digit characters.
I seeded the VB PRNG with the (autoincrement) key value of the web-survey table row I'd just inserted and iterated 30 times, picking characters out of an alphabet string. To add to the seed variability, I changed the table default from 'sequential' to 'random' for the autoincrement field. This increased the range of primary key values from 1 -> 2.14 billion to -2.14 billion -> 2.14 billion. When two different rows' autoincrement keys resulted in identical 30 character 'random' strings, the application threw up its hands as I coded it. Identical random strings weren't supposed to happen... or so I thought. What I discovered would make the Lost In Space robot wave his arms and yell "Danger Will Robinson! Collision imminent!"
PRNGs - An Introduction
To fully understand and develop a non-patched work-around for this problem, I needed to better understand PRNGs and how the VB PRNG features work. Although my patch would reduce the chance that a collision would prevent a survey from being generated, I didn't want to be called from a frantic user needing an immediate fix because the patch had failed. I learned three important things in my research about PRNGs:
The key word in PRNG is pseudo. There aren't truly random sequences generated by software. They may appear random, but they aren't. With computers, "close" is as good as you can get. Usually best we can do is approximate randomness in the computer world. You might also see quasi-random used synonymously with pseudo-random. Examples of a truly random data source would be a Geiger counter measuring radioactive decay or an atmospheric condition sensor.
Some PRNGs are better than others are and there are actually ways to measure their (pseudo) randomness.
The VB PRNG behavior isn't very good.
Where Are PRNGs Used?
The three places you will likely see PRNGs used are Security, Statistics, and Games. Although PRNG applications are not central to this article, it may help you determine likely places to look in your VB code for PRNGs.
PRNG Security Uses
One of the most secure methods of encrypting data is the one-time-pad. If the sender and receiver both use the same random number sequence (same size as the message) to encode their message, then anyone intercepting the message has no clue on how to decrypt the message unless they've also learned of the random number sequence. The problem with this method is the need to use a different random number sequence for each message. PRNGs are also used to generate random data when wiping a hard drive of sensitive data. These two examples only begin to enumerate the uses of PRNGs with security. This is a big list.
PRNG Statistic Uses
The most frequent PRNG use in statistics is with Monte Carlo simulations. However, there are other related scientific fields of study, such as genetic algorithms where PRNGs have been used to select breeding pairs and methods. In this case, I'm using Statistic in an overly broad sense.
PRNG Games Uses
Some of the earliest applications we write as beginner programmers are games. PRNGs are used to select tic-tac-toe squares, shuffle cards, and roll dice. Casinos rely on the PRNGs in their electronic gaming equipment. If you don't think PRNG behavior makes a difference, consider the producers of the TV game show, Press Your Luck.
PRNG Resources For Your Own Study
Knuth, D. E. "The Art of Computer Programming, Vol. 2 Seminumerical Algorithms", Addison-Wesley, second edition, 1981
There are two PRNG-related functions/statements in the VB language, Rnd() and Randomize.
The Rnd() Function
The Rnd() function gives sequences of single precision floating point numbers, where 0 <= Rnd() < 1. Rnd() can be use as a function or can appear on a line all by itself, in the form of a VB statement, without any compilation problems. Since Rnd() is a function, the compiler will discard the returned value if it is used as a statement. For the purposes of consistency, Rnd() will be the notation used in this article. You might see the alternate form, Rnd, used in production code and sample/example code.
The Randomize Statement
The Randomize statement initializes the Rnd() sequence with a new seed. The default seed value is the current Timer() function value (see below). When using the Randomize statement to reseed the sequence, you need to invoke Rnd -1 prior to the Randomize statement. Although I've used "Rnd -1" in my examples and documentation of seed resetting, the -1 can be any negative value.
The Timer() Function
The Timer() function returns values between 0.00 and 86400.00. These are the seconds of the day, starting at midnight with a resolution of 1/100th of a second. Doing the math, there are 8.64 million unique numeric Timer() values.
What Microsoft Tells You
We are at the mercy of Microsoft when it comes to documentation. I think they should be thanked for the overall quality of their help files and product documentation. So this is what Microsoft had to say about the Randomize seed and the use of VB's PRNG features.
VB Classic Randomize Help
From the online help (both VB help file and msdn.microsoft.com), the Randomize statement seed value can be "Any Numeric Expression". That should allow you to supply a HUGE range of seed values.
Numeric variables (byte, integer, long, single, double, currency, decimal)
Date/time literals and variables - which are stored internally as doubles
Formulas that result in a numeric value
Functions that return a numeric value
VB.NET PRNG Help (Rnd and Randomize)
There is new verbiage accompanying VB.NET documentation associated with the Rnd() function and Randomize statement.
Security Note: Because the Random statement and the Rnd function start with a seed value and generate numbers that fall within a finite range, the results may be predictable by someone who knows the algorithm used to generate them. Consequently, the Random statement and the Rnd function should not be used to generate random numbers for use in cryptography.
Although I didn't consider what I was doing cryptography, I was obscuring the value of the survey key by expressing the row ID as a long string of 'random' characters.
This is an under-the-covers explanation of how the VB PRNG actually works. The VB PRNG is a linear congruential generator, meaning that successive values are mathematically derived from the prior value in a formula of the form:
Rk+1 = (Rk* L + M) Mod N
The Rnd() uses the following formula:
x1 = (x0 * a + c) MOD (2^24)
where a = 1140671485 and c = 12820163
The default value for x0 is 327680 (&h50000). This is why you get repeatable sequences when you do not use the Randomize statement prior to the first Rnd() function invocation.
Please notice the 2^24 division. This results in a 'period' for the PRNG, where a sequence of values from the Rnd() function will repeat after 16.77M invocations. In talks with my local crypto expert, we both agreed that this is a serious flaw in the PRNG. Since I was only using the first 30 items of each sequence, I wasn't too concerned about this.
To get the 0 <= Rnd() < 1 values, the x1 value is divided by 2^24 before it is returned to your Visual Basic program code. It is then up to the VB programmer to transform the returned values into a scale and range that is appropriate for the application.
After considerable research, I discovered that there is a weakness in the randomize statement that is much more severe than that in the Rnd() function. Here's what happens inside the Randomize statement:
1. The supplied seed value is cast into a Single datatype. This is understandable, since the Rnd() function returns a Single datatype.
Note: the Timer() value returns a Single datatype which does not require any datatype conversion.
2. The Single (datatype) seed is 'moshed' into 2-byte seed (TwoByteSeed) value with the following XOR operations:
This is a summary of the weaknesses and problems with the VB PRNG.
The Rnd() sequence cycles every 16.77 million items. If you need to create very long random number sequences, like those used in disk-wiping utilities, this period is much too short. If you use enough write operations, you might overcome this deficiency. For some cryptography operations, this short period is a 'stopper'.
The Rnd() sequence can be cracked fairly easily, since it implements a simple linear congruent formula.
The same starting value is used if no Randomize statement precedes the first Rnd() invocation.
The seed is not reset if Rnd -1 (or any negative number) does not precede the Randomize statement. In other words, the stated purpose and function of the Randomize statement doesn't always happen.
Since the Timer() values repeat every day, the Randomize statement is susceptible to the reuse of seed values.
However, the most severe weakness in the VB PRNG is that there are only about 64K possible (post-mosh) unique seeds! This means that there are only 64K unique random sequence starting points!!
Using the Timer() value, the default seed for the Randomize statement, doesn't help as I've illustrated in the following section. Using a (guaranteed) unique seed value doesn't help as I experience in my application.
To help you estimate your chance of collision, take the total number of supplied seed values and divide by 64K. This would be an approximation of the average collisions. A 'collision' is the occurrence of a repeated sequence.
Timer Seed Collision Stats
These figures are the result of generating the actual seed values based on possible Timer() function values and counting identical values produced by the Randomize statement.
24 hour day collisions
* Total = 8574465
* Max = 134
* Min = 128
* Avg. = 130.83595
* Total seeds supplied = 8640001
8AM to 5PM collisions
* Total = 3174465
* Max = 50
* Min = 47
* Avg. = 48.438492
* Total seeds supplied = 3240001
What Can/Should We Do?
Duplicate VB PRNG sequences WILL happen in our applications. Does anyone reading this article doubt it?
So what should you do and what can you do to protect yourself?
Look at your VB/VB.NET code that uses the Rnd() function.
Evaluate consequences of collisions and duplicate sequences. Maybe duplicate sequences don't matter to your application or a particular section of your application.
If adding random sequences to a table, add a unique index on the column and add code to handle the duplicate key condition that will be raised.
If there is a potential problem, you might try some minor code tweaking to reduce the likelihood of a collision. This should be a bang-for-the-buck decision. If the collision probability is low or the risk of consequences is slight, then there might not be any reason to change any of the code. However, you might want to change your code if the cost of repair (company or product reputation, SLA penalties, priority programmer fees, etc.) is high. Most of the minor actions I will detail in a future article continue to use Rnd() and Randomize.
If this article and your own code analysis has identified ticking time bomb in your VB application, then a major code tweaking will be required. A major tweak involves replacing the VB PRNG language features with some other PRNG. I will detail those in a future article.
For those developers who have migrated their VB classic applications to the .NET framework, you face the same weaknesses in the Microsoft.VisualBasic namespace, since Rnd() and Randomize behave identically to their Win32 counterparts. Fortunately, you also have some additional major tweaking options. Microsoft recommends using one of these .NET Namespaces.
Note: There are differences between the (pseudo-random) sequences generated by these two .NET namespaces.
What's Ahead? (Next Article)
In a future article, I will present details of minor tweaking, such as
Breaking up your unique seed into separate byte values and generate multiple sequences derived from these byte-valued seeds.
Changing how you use the Rnd() sequence
Using multiple Rnd() sequences
I will also cover details about replacing Rnd() and Randomize (major tweaks), such as
Generating and using a GUID
Use a different source of pseudo-random numbers. There are several decent libraries on the web, third party routines, and you can implement some of Knuth's algorithms yourself.
The correct assignment to have all 62 alphanumeric characters should be:
Open in new window