Link to home
Start Free TrialLog in
Avatar of WaterStreet
WaterStreetFlag for United States of America

asked on

Generate a table of biased "random" integer cells for a given range of numbers with 4 subsets to each have a different weight for insertion in the table

For background and context, please see the following EE Question: http://www.experts-
exchange.com/Software/Office_Productivity/Office_Suites/MS_Office/Excel/Q_28020212.html#a38854251 where the Accepted Solution is =INT(RAND()*(122-65+1)+65)
 
I've now expanded the range to be from 48 to 122 as in =INT(RAND()*(122-48+1)+48).

How I would use the Accepted Solution to generate a table of integer random numbers in the range of 48 to 122, like in my previous EE question, but where there are (let's say
3 or 4) sub ranges within -- and that each sub range has a specified weight/bias (Bias) for appearing in the table.   I'm thinking of: (1) Using the Accepted Solution to generate
random integers in the range of 0 to 9, =INT(RAND()*(9-0+1)+0, for each column (in a new Exterior Row above the table) and for each row (in a new Exterior Column left of the table); AND (2) Then using for each cell in the main table an IF Function with (a) with its Test for the Bias being some range of combinations of the cell's Exterior row and Exterior cell (random integers), and (b) using the Accepted Solution in the resulting True/False results (for the 4 sub ranges in the main table).

The sub ranges could be 33 to 47, 48 to 57, 65 to 90 and 91 to 122.  The respective bias
are 10, 20, 30 and 40% approximately.

I only have Excel 2003 and I'd rather not use =RANDBETWEEN(65,122), because I don't want to bother digging up my Microsoft CD's (still unpacked from moving), in order to load the Analysis ToolPak add-in.

How might this be done easily?

Thanks
Avatar of Patrick Matthews
Patrick Matthews
Flag of United States of America image

Please see the attached file:

Q-28021494.xls

I use a random value in Col A, and a table of values in G2:I5 to hold the biases:

0      33      47
0.1      48      57
0.3      65      90
0.6      91      122

Based on that, the formula to draw a random integer is:

=INT(RAND()*(VLOOKUP(A2,$G$2:$I$5,3)-VLOOKUP(A2,$G$2:$I$5,2))+VLOOKUP(A2,$G$2:$I$5,2))
Avatar of WaterStreet

ASKER

Thanks mp,

I just got back and am starting to look at what you posted.
I spent a lot of time tinkering with it and some fun too.  My sense is that I can get it to work for me if I understood the logic of the biases and how they are applied (they are not percentages of the distributions throughout the table ).

1.  How can the biases work to relate them to percentages?  Do they need to add up to 1.0?

2.  The High range values never appear in the distribution, so I fixed this by adding one to each.

3.  I understand VLOOKUP, how do I apply this solution when I only have three ranges instead of 4?


Thanks


Thanks
The attached file differs only in that I have more values being generated, and I added formulas in Columns K & L to show how many items there are in each band.

The numbers will fluctuate a bit on each recalc, but they are coming in to the targeted spread.

Q-28021494.xls

As for having three bands, or five, or nineteen, just adjust the lookup table accordingly, and the VLOOKUP expressions where the random integers are generated.

The biasing works because the random number in Col A is a value >= 0 and < 1.  So, the the first band is for (Col A) >=0 and (Col A) < 0.1.  That spread is 10%.
Thanks for the expanded Bias table with the calculated distributions and percentages for each.

I'm tinkering with it now.  

1. When adjusting the biases, do I try to lock in the largest distribution first, 2nd largest, etc., or do it in reverse order?

2. If I add more ranges or delete one, is the last VLOOKUP always preceded with a minus sign?

Thanks in advance.
1) Which band you list first does not matter.  Just make sure that what you use in the first column of that table starts with zero, and that the index values are such that you get your (targeted) distributions correct.

2) The formula should be structured:

=INT(RAND()*( #vlookup to grab upper bound# - #vlookup to grab lower bound# ) + #vlookup to grab lower bound# )
It looks like I cannot do what I wanted, because the sequences of biases need to be in ascending sequence and the corresponding ranges have to be in ascending sequence.  That constraint doesn't seem to fit my business case.  For example, for one business case I want numbers in the range 48 to 57 to have around 50%, range 65 to 90 to be around 32%, range 91 to 122 to be around 16% and range 33 to 47 to be around 2%.

It is a masterful technique but it seems to have constraints that can't even approximately fit what I want to do.

I hope I'm just doing this wrong and you can show how, or does the technique not work for what I want to do; is there another solution?

Ultimately, I need to populate a 64 X 64 table with the population distributions of my business case example, above.   Maybe one solution would be to show me how to easily fully populate the table first with my largest distribution.  Then successively populate it with the 2nd largest distribution, 3rd largest, etc.

In any case, I need to populate a 64 X 64 table with approximately the right percent distribution for each range -- and the ranges need to randomly distributed throughout the table.


Thanks in advance
mp,

I can cut the number of bands from four to three and I can adjust the span of the bands so that the lowest (45 to 57) would have the largest biased distribution % in the main table, the next highest band (65 to 96) would have 2nd largest % distribution, and the highest band (97 to 122) would have smallest distribution % in the table.

The technique you provided requires the highest band to have the largest distribution percentage;  the mid-range band(s) would have successively lower distribution percentages; and the lowest band, the least distribution

In other words, the the current technique seems to require the largest bias be given to the highest band and the lowest bias be given to the lowest band.   That just happens to be the reverse of what I need.

So, (please refer to the last worksheet you provided, above in ID: 38860481) can you modify the following algorithm so that the highest bias be given to the lowest of my (now) three bands, and that the lowest Bias be given to the highest band?

=INT(RAND()*( #vlookup to grab upper bound# - #vlookup to grab lower bound# ) + #vlookup to grab lower bound# )

Or just show me and I'll do it.

Thanks
ASKER CERTIFIED SOLUTION
Avatar of Patrick Matthews
Patrick Matthews
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Well, I'll be ...

Tweaked it to get what I wanted.

Thanks a lot, and I learned a lot of good technique too.

Adjusting the Bias was tricky.  Then had to recalculate a number of times until I got the desired ratios.

When you get a chance, any theory you can pass on for adjusting the biases.  It seems to relate the proportions between the adjacent biases as well as their selected values.  Other that that, I am clueless.

Michael