Learn the techniques to avoid forgery and phishing attacks and the types of attacks an application or network may face.

Assume you have access to a function that, in constant time, generates a real number uniformly at random between 0 and 1. Suppose you are given a discrete probability distribution, that is: A set of n real numbers p1,p2,...,pn such that sigma from i=1 to n of p(i) = 1.

Give an algorithm that generates a random integer x between 1 and n satisfying the condition that Pr(x = i) = p(i), and runs in time O(n + log n) (the log n is left in there as a hint). So, for example, if n = 3 with p1 = 0.3, p2 = 0.5, p3 = 0.2 then it should generate 1 with probability 0.3, 2 with probability 0.5, and 3 with probability 0.2.

Give an algorithm that generates a random integer x between 1 and n satisfying the condition that Pr(x = i) = p(i), and runs in time O(n + log n) (the log n is left in there as a hint). So, for example, if n = 3 with p1 = 0.3, p2 = 0.5, p3 = 0.2 then it should generate 1 with probability 0.3, 2 with probability 0.5, and 3 with probability 0.2.

Experts Exchange Solution brought to you by

Enjoy your complimentary solution view.

Get this solution by purchasing an Individual license!
Start your 7-day free trial.

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

The 'probability distribution' part isn't as confusing as it may sound. The p(1) = 0.3, p(2) = 0.5, p(3) = 0.2 thing is the entire probabilty distribution (or an example of one at least).

That's my hint.

The method that appears obvious and naive consists of a single linear scan ... ??

Better than naive methods are only needed if one wants to generate many numbers according to the given distribution

That makes sense because if you want to generate k random numbers, then the time complexity is O(n + klogn) instead of O(kn) which is obviosly much better. So that may be what was intended.

```
ok my solution involves some C
1 2 3 ...N possible outcomes
P1 P2 P3 PN discrete probability
let’s consider
R() //random generator 0-1
R1=R() //returns 0 - 1 equiprobable
R2=R()+R()<<1 //returns 0 - 3 equiprobable
R3=R()+R()<<1+R()<<2 //returns 0 - 7 equiprobable
Rn=R()+R()<<1+R()<<2+...+R()<<n //returns 0 - 2^n -1 equiprobable
int RN(void) //this function returns a number form 1 to N equiprobable
{
unsigned int i;
do
{
i=R()+R()<<1+R()<<2+...+R()<<n //we chose n verifying that N <= 2^n -1
}while (i< 1 || i >N)
return i;
}
char Prob[N]{ 3,5,6,0,..,5}; //every number represents “proportionally” P1,P2…PN that
//mean that the sum of them does not have to equal 1…
char Trigger[N]{0,0,0,0..0}; //auxiliary array
//this function returns a number form 1 to N weighted with the discrete prob distribution
int WRN(void) {
int i;
While(1)
{
i=RN();
if (Prob[i-1]>0 ) Trigger[i-1]++; //increment the i appearance counter
if (Prob[i-1]>0 && Trigger[i-1]==Prob[i-1])
{
Trigger[i]=0; //reset trigger
return i; //time tu return a value
}
}
```

It could have some errors, I hope you understand the idea...

The solution we are discussing is something like

Build an array [0, .11111, .33333, .66666]

Get R()

If R() is between 0 and .11111 then it's p1

If R() is between .11111 and .33333 then it's p2

If R() is between .33333 and .66666 then it's p3

If R() > .66666 then it's p4

This could all be done in C with a couple for loops, but no one really asked for a code solution.

My mistake then, sorry I thought it has been asked a code algorithm ..

About inefficiencies well your example talks about a discreet distribution with a 0.0001 resolution…

About your example with p1=.11111, p2 = .22222. p3 = .33333, and p4 = 33334 My algorithm expressly deals with proportional integer values of P; it speeds things up…

About not working for every discreet distribution; I don’t quite get why not?

About your example I think your R() is not what we are looking for… we need a number pump from 1 to n where Pj=P[j]

(int)(R()*n) + 1

R() gives a random number between 0 <R() < 1. R()*n gives a random real number between 0 < R()*n < n. Casting it to an int (truncation) gives a

Actually, now that I reread your method I think I see what you are trying to do. For discrete distributions you've traded time for storage. The storage complexity is proportional to the precision. So a very precise distribution would have a very high memory cost, but the generation of the random numbers is constant.

Tell me if this is what you tried to do. For the original p(1), p(2), p(3) = .3, .5, .2 you build an array

{1,1,1,2,2,2,2,2,3,3} then you generate r = random from 0 to 9 and the random number is array[r]. This is actually a cool approach for any distribution with low precision. What you would really be concerned with would be to express each probability as a fraction and find the least common multiple of all the denominators. Then the complexity of creation is O(n + LCM + generatingLCM) But the complexity of generating a random number is O(1). So for many applications, it's much better than my O(n) for creation and O(lg(n)) for generation.

Of course, the original question asked for only getting one random number so mine is O(n + lg(n)) and yours is O(n + LCM + generatingLCM + 1). But who only gets one number from a distribution?

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial[7,5,8] that are the smallest integer numbers that are proportional to (1 – component of original distribution)

I2)f p(1), p(2), p(3) = .2, .4, . 4

then the best array is [8,6,6] reduced to [4,3,3]

3)if p(1), p(2), p(3) = .333333, .333333, .333333

then the best array is [66666, 66666, 66666] reduced to [1,1,1]

4)if p(1), p(2), p(3) = .9999, .00005, . 00005

then the best array is [.0001, 0.99995, 0.99995] reduced to [1,99995,99995]

these array says that

1)

I need to count 7 randomly generated 1s to pump a weighted 1

I need to count 5 randomly generated 2s to pump a weighted 2

I need to count 5 randomly generated 3s to pump a weighted 3

2)

I need to count 4 randomly generated 1s to pump a weighted 1

I need to count 3 randomly generated 2s to pump a weighted 2

I need to count 3 randomly generated 3s to pump a weighted 3

3)

I need to count 1 randomly generated 1s to pump a weighted 1

I need to count 1 randomly generated 2s to pump a weighted 2

I need to count 1 randomly generated 3s to pump a weighted 3

4)

I need to count 1 randomly generated 1s to pump a weighted 1

I need to count 99995 randomly generated 2s to pump a weighted 2

I need to count 99995 randomly generated 3s to pump a weighted 3

the size of the array is not a problem

Yours now doesn't make sense. How did .2 .4 .4 get to be 4 3 3 (shouldn't it be 2 1 1?)

On 4) how come the smallest number isn't the largest this time? Anyway your time complexity for generation (if it works, which I am not seeing) is then O(n*m) where m is the max proportion thing. Mine is O(lg(n)) or O(1) both of which are much better. Do you see how they work?

Pn->1-Pn-> the smallest proportional iteger serie...

Before you say that does not make sense, why don't you try to understand what I did? (if you want ;-) )

on 4 the smalles number P(1) is very high thats why the counter is the minimum 1, the counter for the very infrecuent 2 & 3 is very high... what is what you do not understand???

if you do not see how it works please say nothing at all until you can say why it fails or why it works...It's amazing you are giving equations for code that you do not understand...

It is not O(n*m): see the case 4, the pump produce a valid weighted random number on average almost every 3 loop of the WRN function and that it is not O(n*m) as you said

bottom line, take the time to understand what I did and we can keep analyzing the thing later ;-)

big O notation is for worst case. I can understand "I need to count 99995 randomly generated 2s to pump a weighted 2".Which means in the worst case you need 99995 random numbers at least. The other approaches all use 1 random number. If you want to discuss this further, perhaps we should open a new thread.

I see how you got the proportions now, but it's still not clear why you are using 1-p in each case instead of p. In the .2 .4 .4 one, the 2 should appear twice as often as the 1 (as should the 3) so why use 4 3 3 and not 2 1 1?

The algorithm worst scenario is how many iterations maximum takes to get "ANY" number out of the pump... in this case that is not 99995 but 3 (average)instead. I hope you can see it now.

I use 1-P cause the counter aproach requieres it, can't you see that the most frequent number is the one that has the higest prob but that requieres the smallest counter trigger?

If I use P the pump would be holding longer the most frequent number when what I want is just the oposite. that's why I use the complemnet of P

What I wrote is effectively the same as what TommySzalapski posted, but generalized to work in cases in which the values are not sorted in order, such as the one in the example. Just posting it in case someone else comes along looking for this same solution.

```
1) Generate a random number R and build an array a
such that a[0] = 0 and a[i+1] = (sum from n=0 -> i) of P(j).
2) Find a[i] such that a[i-1]<R<a[i].
3) Return P(i-1).
```

Programming Theory

From novice to tech pro — start learning today.

Experts Exchange Solution brought to you by

Enjoy your complimentary solution view.

Get this solution by purchasing an Individual license!
Start your 7-day free trial.