This has almost certainly been raised before by others, but there seems to be something uncomfortable using .size() (of a List or Set for example) as the seed parameter as in Random.nextInt(l.size()).

This approach will bind the search space for a random int to the length of the array/ list/set, which is not going to be the same volume of search space on a set such as for example : 4,36,90,203,480017.Here, nextInt(size()) means there is an equal chance of any one of the five index positions being returned, whereas the true search space ought to be 480017 , i.e. the nextInt on the *largest* value in the list. IOW, the two approaches are conceptually / philosophically different - size() by itself never needing a re-iteration to obtain the int, whereas nextInt on the largest value in a non-contiguous set or any set not having values matching its indexing will require (many) more calls and therefore be nominally more random, than nextInt(size()).

There are plenty of guru sites where .size() is used of course : i.a. here or here.

Perhaps in encryption terms, the potentially far smaller search space of nextInt(size()) will have an impact on the crackability algo, and hence time ?

Any thoughts ?

This approach will bind the search space for a random int to the length of the array/ list/set, which is not going to be the same volume of search space on a set such as for example : 4,36,90,203,480017.Here, nextInt(size()) means there is an equal chance of any one of the five index positions being returned, whereas the true search space ought to be 480017 , i.e. the nextInt on the *largest* value in the list. IOW, the two approaches are conceptually / philosophically different - size() by itself never needing a re-iteration to obtain the int, whereas nextInt on the largest value in a non-contiguous set or any set not having values matching its indexing will require (many) more calls and therefore be nominally more random, than nextInt(size()).

There are plenty of guru sites where .size() is used of course : i.a. here or here.

Perhaps in encryption terms, the potentially far smaller search space of nextInt(size()) will have an impact on the crackability algo, and hence time ?

Any thoughts ?

Do more with

EXPERT OFFICE^{®} is a registered trademark of EXPERTS EXCHANGE^{®}

so the value will be a value between 0 and n

The question is - would it be more random than selecting a random index into the list?

I think the answer is no. We can see that by working back from the probabilities. If we generate random values until we find one that is in the list, the probability of hitting any given value is still the same - 1/n if there are n items. This is the same probability as we get by selecting a random index.

The only way that would not be true is if nextInt(10), say, is less random than nextInt(100000).

If you or I wrote the random class, there's a decent chance nextInt(10) would be less random because we might do

Int value = random.nextInt();

Int result = value % n ;

to generate [0, n) but this method is actually incorrect because of how the modulus operator works around 0.

But the guys in Java land know this so their random function is ok and so both methods will get the same degree of randomness. Just choosing a random index is much faster.

Doug

the probability of hitting any given value is still the same - 1/n if there are n items.Certainly if you scan a range, each random value has a very low probability of hitting an item in the list.

But not if the search is done on a range something like : 1, 2, 4001, 88937, 4000023, 4000027

I meant the probability of the algorithm returning the value from the list using either algorithm:

```
a) int index = random.nextInt(list.size) ; // Small range used here
return list.get(index) ;
b) boolean found = false ;
while (!found) {
int possibleValue = random.nextInt(min,max) ; // Wide range used here to sample from
if (list.contains(possibleValue))
return possibleValue ;
}
```

With either of these, the probability distribution of the Doug

Should have been:

```
int possibleValue = random.nextInt(max-min+1)+min ;
```

And yes when the list is:

1, 2, 4001, 88937, 4000023, 4000027

then max will be 4000027

so as you say the probability of random.nextInt() returning a value in the list is 6/4000027 (i.e. very very low). But we're not just making one call to random.nextInt() in the second method.

The distribution of the values returned from the

But the value it returns will still be evenly distributed within the list of values in the original list:

1, 2, 4001, 88937, 4000023, 4000027

just as will be the values returned by the other method where we pick a random index, not a random value.

So given that the two methods both produce a random value from the list with the

And given that the first is much much faster, it's the one to prefer (the one where we randomly pick an index).

because if they were equal probabalistically, then they'd take the same amount of time

Now I'm the one not really following. Why would the same probability distribution imply they should take the same time to compute?

FYI in general "SecureRandom" which is a "better" random (it's not psuedo random) can take an arbitrary amount of time to respond while it waits for sufficient "entropy". Fun stuff. But anyway it's good at producing actual random values on a computer, but does so at the cost of unpredictable response times, unless you buy an "entropy source" for your machine...which means something like a radioactive isotope which decays somewhere and gets measured.

all you need to do is select a random number from 1 to (number of elements of array) and then use the returned number to index into your array i.e array[random(1,array.lengt

So the longer time taken by algo be is symptomatic of more *entropy isn't it ? :)

* At least, a factor of entropy relative to the two algos.

It could be, but I don't think it is :)

Running a psuedo random generator many times in a a row (which is what we're doing in (b)) doesn't increase the randomness of the final value. If you knew the first value and how the generator works, you'd also know the 1000th value - it's determined by the first value exactly. Hence it's not "more random" just because we ran it over and over.

(Side note - this is actually how most Two Factor authentication algorithms work - you just need to know the initial seed and how to generate the sequence to generate the latest value, so all you're really proving is that you had the seed. And although the values in the sequence look random to an observer, say somebody looking over my shoulder, it's critical that they're not actually random or no way for the server to verify them).

Also we can easily make a slower algorithm for (b) with no change in randomness:

```
a) int index = random.nextInt(list.size) ; // Small range used here
return list.get(index) ;
b) int index = random.nextInt(list.size) ; // Small range used here
Thread.sleep(2000) ;
return list.get(index) ;
```

so dangerous to assume that taking longer or doing more operations somehow increases randomness.

That's why if you need "truly random" you need some sort of external source of that randomness - something which can't be inferred from the internal machine state (since CPUs should always be deterministic). You may have seen some security programs asking you to move your mouse around for a bit before they generate a key. They're using those mouse moves as a weak source of "true randomness".

As I say there are better sources of "true randomness" but generally means buying an extra PCI card with a radioactive element inside it and plugging it in. Sort of thing you would find in the back of a modern slot machine.

The random implementation in java is not really random.

public int nextInt(int n)Java 7 Docs

Returns a pseudorandom,uniformly distributedint value between 0 (inclusive) and the specified value (exclusive), drawn from this random number generator's sequence. The general contract of nextInt is that one int value in the specified range is pseudorandomly generated and returned.All n possible int values are produced with (approximately) equal probability.The method nextInt(int n) is implemented by class

I would not use it for cryptography you'd be better off using one of the publicly available random number generators on the internet https://en.wikipedia.org/wiki/List_of_random_number_generators#Random_number_servers

What's important here is that if the array is not contiguous, there's no guarantee that a value actually contained in the array will be returned. This means that x number of searches will fail, making the time taken to find any or all values longer.

Totally agree with this.

So the approach using the length of the array is nowhere near as random as the other approach, and this is the issue.

This is the part we don't really agree on :) But I don't think there's much more we can say about this. I think we're just disagreeing (sort of) over what the definition of "random" is - which is actually a complicated question.

For program testing we need the same set of random numbers so we can repeat the tests and get the same results. In production we don't want the repeatability so we have to seed the random number generator with some form of entropy.

David - the approach we use for this is we defined RandomGenerator as an interface with two implementations:

PredicatableRandomGenerato

SecureRandomGenerator (a wrapper around Java's SecureRandom class)

Then the tests use PredicatableRandomGenerato

Doug

P.S. This is also why I assumed there was a method random.nextInt(min,max) earlier in this thread because that's in our interface but doesn't exist in either underlying Java class - we just added it since it's very handy :)

## Premium Content

You need an Expert Office subscription to comment.Start Free Trial