[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
Solved

Posted on 2007-07-21
Medium Priority
221 Views
OK Looks like it does what I want to..

1. Can I improve my random number calculation- I got it stanford.edu- Pfaff
I thought I saw a shortcut somewhere..

2. Also do I really need the the used array to check for cards allready there...

Here is the relevant code...

srand(static_cast<unsigned>(time(0)));

string deck[N];
bool used[N];

for (int i = 0; i < N; i++)
used[i] = false;
for (int i = 0; i < N; i++)
{
int r = i + rand() / (RAND_MAX / (N - i) + 1);
if (used[r])
cout << "Card taken " << endl;
else
{
string t = deck[r];
deck[r] = deck[i];
deck[i] = t;
}
}
Thanks again
0
Question by:CPlusJavaCSharp
• 10
• 7
• 6
• +3

LVL 43

Expert Comment

ID: 19541845
Where You set the card is used? I can't see it.
> 2. Also do I really need the the used array to check for cards allready there...
No. And Yes in fact. The point is it does not have to be separate array. You can yse something like
struct Card {
Card(string const &n, bool const u = false) : name(n), used(u) {}
string name;
bool used;
}

the have just one table
Card deck[N];

> I thought I saw a shortcut somewhere..
And what it is?
0

LVL 3

Author Comment

ID: 19543695
This
int r = i + rand() / (RAND_MAX / (N - i) + 1);
Versus

int r = i + rand() % (N - i);
0

LVL 1

Expert Comment

ID: 19545156
for (int i = 0; i < N; i++) // loop 1
used[i] = false;
for (int i = 0; i < N; i++) // loop 2
{
int r = i + rand() / (RAND_MAX / (N - i) + 1);
if (used[r])  // used check
cout << "Card taken " << endl;
else
{
string t = deck[r];
deck[r] = deck[i];
deck[i] = t;
}
}

This is a bit off-topic from what you just said...

at loop 1, you just set all the used[] to false.
in loop 2, there is NO where that set used[] to true.
so, at used check, there is no way it can be true.

Hence you can elimate that check.

I think the int r = i + rand() % (N - i); is equal to the original expression, but I just drink a glass of wine. :) so maybe someone else can double check too.
0

LVL 53

Expert Comment

ID: 19545553
>> This
>> int r = i + rand() / (RAND_MAX / (N - i) + 1);
>> Versus
>>
>> int r = i + rand() % (N - i);

These are slightly different.

Suppose N = 5, and RAND_MAX = 15 (just for the sake of argument). So, the loop would generate these random numbers :

(a) for the first formula :

i = 0 : 0 -> 3
i = 1 : 1 -> 4
i = 2 : 2 -> 4
i = 3 : 3 -> 4
i = 4 : 4 -> 4

(b) for the second formula :

i = 0 : 0 -> 4
i = 1 : 1 -> 4
i = 2 : 2 -> 4
i = 3 : 3 -> 4
i = 4 : 4 -> 4

In other words - there's a problem if RAND_MAX / N is a small value. The RAND_MAX on my system is equal to 32767, so :

32767 = 7 * 31 * 151

So, it's a multiple of 4681 (= 31 * 151). So, let's take for example N = 4682 and i = 1 :

(a) first formula : 1 -> 4096
(b) second formula : 1 -> 4681

As you see - there's a real problem there !!

And if that's not enough, there's another problem with the first formula : it doesn't have the same chances for every value. Neither does the second formula, but it's a lot closer to perfect. With the first formula, the highest value that can be generated by it has potentially a lot less chance (even 0 sometimes) than the other values.
With the second formula, this difference in chance is more equally divided over all the values, because we're working with a modulo operation, which is a lot more fair.

So, I don't know where you found that first formula, but it doesn't look very well thought out (can you provide a link to where you found it ? Maybe there was a good reason ...). The second formula is far superior.
0

LVL 39

Expert Comment

ID: 19546321
>>>> The second formula is far superior.
That might be true but to get a random number between 0 and N-1 you should use

r = rand()%N;

All other gives different probabilities for lower and higher indices.

Regards, Alex
0

LVL 39

Expert Comment

ID: 19546438
>>>> string deck[N];
why are you using an array of strings rather than suitable classes or structs like that:

enum Suit   { Diamond, Heart, Spade, Club, MAX_SUIT };
enum Rank { A = 1, N2, N3, N4, N5, N6, N7, N8, N9, NT, J, Q, K, MAX_RANK = K };

struct Card
{
Suit  s;
Rank r;
};

struct Deck
{
Card deck[52];
};

You easily could add member functions that would make things easier and safer, e. g. a constructor and function like that:

Deck::Deck()
{
for (int i = 0; i < MAX_SUIT; ++i)
for (int j = 0; j < MAX_RANK; ++j)
{
Card c = { i, j};
deck[i*MAX_RANK + j] = Card
}
shuffle();
};

void Deck::shuffle()
{
std::vector<Card>  pack(&deck[0], &deck[52]);
for (int i = 0; i < 52: ++i)
{
int idx = rand()%(52-i);
deck[i] = pack[idx];
pack.erase[pack.begin()+idx];  // removes card from pack
}
}

Regards, Alex
0

LVL 53

Expert Comment

ID: 19546547
>> >>>> The second formula is far superior.
>> That might be true but to get a random number between 0 and N-1 you should use
>>
>>     r = rand()%N;

That's what the second formula does ;)

Or are you referring to the odd way of adding to i ? I assume there's a reason for that ...
0

LVL 39

Expert Comment

ID: 19546905
>>>> int r = i + rand() % (N - i);
>>>> That's what the second formula does ;)
Actually
r = rand()%N;
is much different to the above formula (named 'second'). But I agree that for shuffling purposes it doesn't make much differences whether the higher indices were shuffled more often than the lower ones.

0

LVL 53

Expert Comment

ID: 19546941
>> Actually
>>          r = rand()%N;
>> is much different to the above formula

I was referring to the rand() % (N - i) part only ... the + i is just to set the lowest value for the random number ... Using r = rand() % N would generate entirely different values, so it's not the same.

I assume there's a reason to do it this way (although I don't see it ...).
0

LVL 22

Expert Comment

ID: 19550931
Assuming rand() returns uniformly random values, this routine will return a uniformly random value between 0 and (modulus - 1).  Somtimes multiple calls to rand() will be made, and this cannot be avoided if you want to have a proper uniform distribution.

int rand_mod(int modulus) {
int rv;
int rvlimit = (RAND_MAX / modulus) * modulus;
do {
rv = rand();
} while (rv >= rvlimit);
return rv % modulus;
}
0

LVL 22

Expert Comment

ID: 19551041
ahhh, OCD won't let me leave that alone.  0 <= rand() <= RAND_MAX, not 0 <= rand() < RAND_MAX.  Can't put an integer overflow in there either.  so:

int rand_mod(int modulus) {
int RMmod = RAND_MAX % modulus;
if (RMmod == (modulus-1)) {
return rand() % modulus;
} else {
int rvlimit = RAND_MAX - RMmod;
int rv;
do {
rv = rand();
} while (rv >= rvlimit);
return rv % modulus;
}
}
0

LVL 3

Author Comment

ID: 19552468
To those that wanted to see where I got the formula
int r = i + rand() / (RAND_MAX / (N - i) + 1);

http://www.stanford.edu/~blp/writings/clc/shuffle.html

I found the link by doing a seach on this EE page...

0

LVL 53

Accepted Solution

Infinity08 earned 150 total points
ID: 19554060
>> http://www.stanford.edu/~blp/writings/clc/shuffle.html

They missed one thing with that shuffle algorithm (like I already explained) : the last element of the array has a lot less chance of getting picked for shuffling, so the chance that it stays in the back of the array after the shuffle algorithm is pretty big !
0

LVL 39

Expert Comment

ID: 19554985
>>>>  the last element of the array has a lot less
>>>> chance of getting picked for shuffling,
I don't know whether the fact that higher numbers don't change positions as often than lower numbers has any impact on the quality (or randomness) of the shuffling. If we have a deck of 52 cards the last card has a chance for each 1 ... 51 to be chosen and therefore to change place. Ok, it won't chance its place after that but the probability to change its position with any other position including the last is equal as the chance to get picked increases but the chance that it was picked already increases with the same probability. Going on we have, the last-before-last might be choosen 1 ... 50 times with equal chance and last - if not picked by any of the 50 - it is a fifty-fifty chance to change with last. Hence we have equal probabilties as well and that can be proved decrementally down to card 0.

I made a little prog that records the shuffling and got these results:

0-26 |  1- 3 |  2-50 |  3-31 |  4-17 |  5-27 |  6-47 |  7-12 |  8-46 |  9-12 |
10-16 | 11-27 | 12-20 | 13-26 | 14-24 | 15-26 | 16-45 | 17-28 | 18-36 | 19-24 |
20-35 | 21-51 | 22-35 | 23-46 | 24-35 | 25-51 | 26-30 | 27-51 | 28-50 | 29-36 |
30-48 | 31-47 | 32-40 | 33-49 | 34-36 | 35-40 | 36-44 | 37-39 | 38-42 | 39-40 |
40-49 | 41-50 | 42-50 | 43-48 | 44-47 | 45-49 | 46-51 | 47-49 | 48-51 | 49-49 |
50-51 | 51-51 |

0->26->13->15->30 (4)
1-> 3->31 (2)
2->50->28->41->42->51 (5)
3-> 1->47 (2)
4->17->28 (2)
5->27->11->51 (3)
6->47->31->44->49 (4)
7->12-> 9->20 (3)
8->46->23->51 (3)
9->12 (1)
10->16->45 (2)
11->27 (1)
12-> 7->35 (2)
13->26 (1)
14->24->19->35 (3)
15->26 (1)
16->10->49 (2)
17-> 4->50 (2)
18->36->29->34->44 (4)
19->24 (1)
20->12->22->24->40 (4)
21->51->25->27->46->48->50 (6)
22->35 (1)
23->46 (1)
24->14 (1)
25->51 (1)
26-> 0->48 (2)
27-> 5 (1)
28->17 (1)
29->36 (1)
30->26->43->51 (3)
31-> 3 (1)
32->40->35->39->49 (4)
33->49->40->45->47 (4)
34->36 (1)
35->20 (1)
36->18->47 (2)
37->39->40 (2)
38->42->50 (2)
39->37 (1)
40->32 (1)
41->50 (1)
42->38 (1)
43->48 (1)
44->36 (1)
45->16 (1)
46-> 8 (1)
47-> 6 (1)
48->30 (1)
49->33 (1)
50-> 2 (1)
51->21 (1)

The first output is  i and r  (using the first algorithm) and of course at end of the loop there are some loop cycles where there is no shuffling *but* actually any of the cards was shuffled at least once.

Using the second algorithm the results were different but you can't say it is significant better or worse.

0- 2 |  1-24 |  2- 2 |  3-13 |  4-48 |  5-19 |  6-44 |  7-33 |  8-51 |  9-13 |
10-49 | 11-14 | 12-51 | 13-38 | 14-15 | 15-19 | 16-35 | 17-50 | 18-48 | 19-49 |
20-23 | 21-47 | 22-22 | 23-42 | 24-28 | 25-41 | 26-41 | 27-45 | 28-44 | 29-34 |
30-47 | 31-46 | 32-46 | 33-36 | 34-43 | 35-48 | 36-38 | 37-42 | 38-50 | 39-41 |
40-41 | 41-45 | 42-45 | 43-50 | 44-44 | 45-50 | 46-49 | 47-47 | 48-49 | 49-49 |
50-50 | 51-51 |

0-> 2 (1)
1->24->28 (2)
2-> 0 (1)
3->13-> 9->38 (3)
4->48->18->35->49 (4)
5->19->15->49 (3)
6->44->28 (2)
7->33->36 (2)
8->51->12 (2)
9->13 (1)
10->49->19->46->48 (4)
11->14->15 (2)
12->51 (1)
13-> 3->36->50 (3)
14->11->19 (2)
15->14 (1)
16->35->48 (2)
17->50->38->43->45 (4)
18->48 (1)
19-> 5 (1)
20->23->42 (2)
21->47->30 (2)
22 (0)
23->20->37->45 (3)
24-> 1->44 (2)
25->41->26->39->40->45 (5)
26->41 (1)
27->45->41->42->50 (4)
28->24 (1)
29->34->43 (2)
30->47 (1)
31->46->32->49 (3)
32->46 (1)
33-> 7->38 (2)
34->29->50 (2)
35->16 (1)
36->33 (1)
37->42 (1)
38->13 (1)
39->41 (1)
40->41 (1)
41->25 (1)
42->23 (1)
43->34 (1)
44-> 6 (1)
45->27 (1)
46->31 (1)
47->21 (1)
48-> 4 (1)
49->10 (1)
50->17 (1)
51-> 8 (1)

>>>> we're working with a modulo operation, which is a lot more fair.
As NovaDenizen showed both algorithms were of the same value. BTW, modulo makes a divison as well.

Regards, Alex
0

LVL 53

Expert Comment

ID: 19555406
>> I don't know whether the fact that higher numbers don't change positions as often than lower numbers has any impact on the quality (or randomness) of the shuffling.

It does. (and btw, the problem is only with the highest number - the others have equal probability)

In a perfect shuffle, every shuffled deck has the same probability of occurring. With this shuffle algorithm, shuffled decks with the highest card at the end have a higher probability.

Suppose this is used for a casino. If I know this, then I'd always depend on the fact that the highest card is very likely at the end, and possibly make a lot of money off of that.

Try to run the shuffle algorithm a few million times, and check the last card - see if there's one that jumps out ... I ran this loop 100 million times :

for (cnt = 0; cnt < 100000000; ++cnt) {
for (i = 0; i < 52; ++i) arr[i] = i;
shuffle(arr, 52);
++(freq[arr[51]]);
}

And the result confirms what I said : the frequency table shows that the value 51 has a higher chance than the other values of being in the last position :

1923897
1924524
1922013
1922925
1923418
1922640
1925007
1922569
1922467
1924377
1922166
1923572
1918385
1921655
1922154
1921977
1923848
1919474
1922583
1923365
1920898
1921411
1925884
1922631
1921128
1920852
1921518
1920736
1920570
1921953
1922471
1922740
1923313
1920180
1923806
1921827
1923146
1921230
1921974
1921736
1922486
1924857
1920184
1922164
1922500
1923390
1920192
1920543
1920390
1920959
1920507
1968808

If you're not convinced, put the above values in a graph, and see the nice peak at the end. Or just watch the first 3 digits ...

Note that 52 is a relatively small value. The higher the value (say 200 or so), the better noticeable the effect would be. So, feel free to try it again with a higher value, and you'll see that the peak will get even bigger.

Btw, the writers of the algorithm do agree with me : "Only effective if N is much smaller than RAND_MAX;" which was basically what I said : "In other words - there's a problem if RAND_MAX / N is a small value".
And 52 is not that small compared to 32767 (confirmed by the test results). With a bigger value than 52, it would be even worse !!
0

LVL 39

Expert Comment

ID: 19555737
>>>> If I know this, then I'd always depend on the fact
>>>> that the highest card is very likely at the end, and
>>>> possibly make a lot of money off of that.
If your results are correct and assuming the casino also has a deck of 52, then it is only a 1.5 percent higher probability that the highest card of a new unshuffled pack of cards was shuffled at the very top of the closed deck. If we assume 4 players and a new pack any hour, your chance comes once in 4 hours. Maybe you should set yourself a limit ;-)

As far as I understand the argument, it is a rounding issue only. If so, the algorithm could be improved to

int rn = rand();
int n_i = (N-i);
int div = (RAND_MAX + n_i/2) / n_i  + 1;
int r = i + (rn + div/2)/ div;

I'll check whether that solves the problem.

Regards, Alex

0

LVL 53

Expert Comment

ID: 19555912
>> Maybe you should set yourself a limit ;-)

Hehe ... well ... it's the principle that counts ... it's not a perfect shuffle.

>> As far as I understand the argument, it is a rounding issue only.

The problem is that RAND_MAX is divided in 52 unequal partitions, of which the first 51 are equal in size, but the last one is (potentially) a lot smaller.

When using modulo instead, you also divide RAND_MAX in 52 unequal partitions, but the "unequality" is divided up over a lot more partitions, with a maximum difference of 1 between any two.

An example with RAND_MAX = 15 and N = 5. Now there are 16 different rand() results (0-15), and here's what the final "random" values are for each of them with the two algorithms :

(a) for the first algorithm (rand() / (RAND_MAX / N + 1)) :

0, 1, 2, 3 : 0           (25.00% chance)
4, 5, 6, 7 : 1           (25.00% chance)
8, 9, 10, 11 : 2       (25.00% chance)
12, 13, 14, 15 : 3   (25.00% chance)
: 4                          (  0.00% chance)

(b) for the second algorithm (rand() % N) :

0, 5, 10, 15 : 0       (25.00% chance)
1, 6, 11 : 1             (18.75% chance)
2, 7, 12 : 2             (18.75% chance)
3, 8, 13 : 3             (18.75% chance)
4, 9, 14 : 4             (18.75% chance)

Neither of them is perfect, but the second is better for most situations.

And yes, I know this is an extreme example, but the same holds true for most other values. The smaller RAND_MAX / N is, the worse the results are.

I would prefer the modulo algorithm over the other ... It not only gives better results, it's also simpler ... :)
0

LVL 53

Expert Comment

ID: 19555968
And actually, to make it even more "perfect", you can make sure that (RAND_MAX + 1) is a multiple of N. In that case, both algorithms would perform OK.

You can easily implement that by doing something like :

int myRand(int N) {
int limit = RAND_MAX - ((RAND_MAX + 1) % N);
int r = 0;
do {
r = rand() % N;
} while (r > limit);
return r;
}

in other words : if the rand value falls into the "excess" values, try again ...
0

LVL 22

Expert Comment

ID: 19556363
Infinity08, your code will attempt an infinite loop when RAND_MAX == MAX_INT, as it is in the gnu rand() implementation.  My code follows the same principal as yours except it is immune to integer overflow.
0

LVL 22

Expert Comment

ID: 19556387
Hmm.  Maybe not an infinite loop, but you will have an integer overflow which is to be avoided.
0

LVL 53

Expert Comment

ID: 19556600
>> Hmm.  Maybe not an infinite loop, but you will have an integer overflow which is to be avoided.

Well, I had my system in mind, where RAND_MAX is 32767, but you're right of course - I should have used unsigned ints instead of just plain ints ... The code was just intended as an illustration though - it hasn't been optimized and debugged, not even tested ;)

My apologies for repeating you - I didn't realize (this thread has been going on for a while - it's hard to keep track of what's been said already and what's not).

btw, what if RAND_MAX is defined as MAX_UINT ? I think it's better to use unsigned ints in your code. Or even unsigned longs just to make sure ...
0

LVL 22

Expert Comment

ID: 19556727
It's very unlikely that rand() will ever be unsigned.  It's been an int forever, and it's an int in the C99 spec (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf).
0

LVL 53

Expert Comment

ID: 19556909
Well, if you allow me to be pedantic, it doesn't state whether it's signed or unsigned - just that it's an integer value. The only limitation is that it has to be at least 32767. So, I interpret that as an intentionally vague description that leaves things open for the future. On the other hand, the rand() function returns an int, so that tips the balance to your side ... lol

Anyway, that's not the issue here lol ... If needed, you can easily check the RAND_MAX value in your code (a few #if's should do) and choose the datatype that fits it ...
0

LVL 22

Expert Comment

ID: 19557122
The spec defines rand() as having an int return type.  It's true that the default char type may be signed or unsigned, but the default int type is always signed.  INT_MIN must be -32767 or less.
0

LVL 39

Expert Comment

ID: 19557673
>>>> if the rand value falls into the "excess" values, try again ...
Another way to make perfect rand with modulo N is to multiply all rand numbers so that they were equally distributed between 0 and (RAND_MAX - N - 1 + RAND_MAX%N)

void shuffle(std::vector<int>& v)
{
static double drm = (double)RAND_MAX - N + RAND_MAX%N;
static double df  = drm/RAND_MAX;
for (int i = 0; i < N; ++i)
{
int        rno = rand();
double drn = df * rno +  0.4999999;
int    rn  = (int)drn;
int r = rn%N;
int t = v[r];
v[r] = v[i];
v[i] = t;
}
}

Regards, Alex

0

LVL 22

Expert Comment

ID: 19557829
Alex:
As long as rand() is returning ints, you're going to get a nonuniform distribution of shuffle indices.  It doesn't matter if you convert it to a double or how you mess with it.  As long as the shuffle index is some function of a single invocation of rand() you aren't going to get a uniform distribution (at least when RAND_MAX % N != N-1)).

Your shuffling algorithm is also non-ideal because even if you use it with a perfectly uniform rng function, the resulting permutations will not be uniformly distributed.  There are N! possible permutations, and N^N possible swap sequences in your algorithm, and N! does not divide N^N when N>2, so it follows that some permutations must be more likely than others.
0

LVL 53

Expert Comment

ID: 19588286
Just fyi : if the answers are unsatisfactory, you can always ask for clarification ... and I'll be glad to elaborate ! I like to give the best help possible, and it seems that it was not the case here for some reason ?
0

Featured Post

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generatâ€¦
This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a Gâ€¦
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relatâ€¦
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
Suggested Courses
Course of the Month18 days, 22 hours left to enroll