Avatar of ugeb
ugebFlag for United States of America

asked on 

'randomly' distributing elements into buckets

Hi,

I have the task of placing users of a game into buckets based on their user ID.  The number of buckets is set for a given trial, but could vary from trial to trial.

For example, let's say I have 1000 users of a game, and I would like to place them randomly but evenly into 5 buckets, 1 through 5. However, whenever I compute the bucket for a given user, I always need to place them in the exact same bucket, so I need a deterministic way of 'randomly' placing a user in one of N buckets.  I would like each bucket to be representational of the whole data set.

Another follow on issue is that I have a certain % chance that they won't be placed in ANY bucket, but be left out of the experiment.  For example, maybe there's a 70% chance they will be placed in a bucket, and 30% chance they won't. If they ARE placed in a bucket, then I have to know which bucket.  It has to be repeatable for a given user.

Any suggestions on how I can approach this problem or on solutions already available?

Thanks!





Math / Science

Avatar of undefined
Last Comment
TommySzalapski
Avatar of ozo
ozo
Flag of United States of America image

you might use a hash function
Avatar of ugeb
ugeb
Flag of United States of America image

ASKER

that's not really a possibility in my situation
Avatar of ozo
ozo
Flag of United States of America image

How does your situation prevent that possibility?
If you want an exact number in each bucket (say 150 in each bucket and 250 not in any), then the best way I can think of is to shuffle the list of users. Then just put the first 150 in the first bucket, etc. If you manually choose a random number to seed the generator with, then it will shuffle it the same way every time or you can maintain the list of shuffled users.

Quick reminder of the O(n) shuffle algorithm:
Shuffle
for i = 2 to 1000
  j = rand(from 1 to i)
  swap user i with user j

If the number of users in each bucket shoud be an approximate, not fixed, number, then the hash codes will work just fine.
Avatar of ugeb
ugeb
Flag of United States of America image

ASKER

Hi,

Sorry, I read "hash function" as "hash table".  Yes, a hash function (or algorithm) is precisely what I am searching for.   I know hashes of the sort "a mod n" are often used, but I have two components.  The first component is determining whether something will be placed in the bucket or not, and that is an uneven distribution.  The second component determines which bucket.  Do you know of hash functions that can assign based on uneven distributions?

Tommy:  Unfortunately I don't think that solution will work as I do not have the entire data set available at a given time.  Numbers (buckets) need to be assigned dynamically.

Thanks!
ASKER CERTIFIED SOLUTION
Avatar of ozo
ozo
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
SOLUTION
Avatar of deighton
deighton
Flag of United Kingdom of Great Britain and Northern Ireland image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
You know that the random number generator requires a seed to work and that if you give it the same seed every time, it gives you the same 'random' numbers.
So for each item, take the item's unique ID (or whatever) and seed the random number generator with it. Then it can generate the same random number for the same item each time and it doesn't matter how the data is distributed.

I would just do it with one random number using Ozo's idea of the non-bucket as a sixth bucket but if it makes it easier, you could also get two random numbers. One for in/out of the buckets and one for which bucket if in.
Math / Science
Math / Science

The Math / Science topic primarily includes discussions of mathematics, physics, statistics and economic analysis, but also biology, chemistry and other sciences.

10K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo