Weighted results but with accurate scaling

Hi experts,


   I am looking for a system where i pick up some values from the database based on weight but they must shuffle well and not sticking to fixed values or giving  certain sets of value everytime . it should be a balanced selection and my total number of records is around 300 and growing. To make you clear letme put forward an example. Suppose i have 5 objects say a, b ,c ,d ,e with weight 1 , 2 ,3 ,4 ,5 respectively. Lets assume i m selecting 2 at a time. So chances are that 5 and 4 gets selected the most . But i want that 1, 2, 3 also comes up during the shuffle and not just 5 and 4 like static output everytime. I want to do this by pulling out directly from the mysql and not shuffling arrays from mysql query as my record set is growing day by day and wont b a good idea.Tried mysql rand , using offset , pulling out a id just greater than random generated id, but all it sux  So please help experts.
pvsbandiAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Steve BinkCommented:
I am interested in seeing what other experts have to offer, mainly because I do not think this is possible in MySQL without duplicating the coding that would normally be in PHP.  That coding will likely not run nearly as efficiently as it would in PHP, because this is not something MySQL is inherently meant to do.  MySQL does not understand weights - it understand data and related data.  MySQL is egalitarian, meaning all data is equal.  The concept that some data is "more important" than other data is foreign to a database.

That said, I have two ideas - one for MySQL and one for PHP.

1) In MySQL, you would need to create a stored procedure that selects all of your base data with weights.  That would return a set like ( ('a',1),('b',2),('c',3),('d',4),('e',5) ).  Use a cursor to traverse through that set, inserting each into a temporary table.  'a' would be inserted once, 'b' would be inserted twice, and so on.  Finally, select a LIMITed set from that temporary table, using a random ORDER BY directive.

2) Same idea, but using a PHP array.  In PHP, query the base data with weights.  Insert each into an array, just as described with the temporary table above:

<?
$randarr=array();
while ($row=mysql_fetch_assoc($result)) {
  for($x=0;$x<$row['weight'];$x++){
    $randarr[]=$row['value'];
  }
}
?>

Then use shuffle() or array_rand() to pull your final list.
0
pvsbandiAuthor Commented:
Thanx routinet for your valuable reply, i also feel the same that mysql should not be put any extra load to manipulate the weighted results. For this php is always preferrrable. Till shuffling and getting weighted arrays is all ok but my problem lies where i had to pull out  all the values from mysql and then shuffle the arrays. I am going to implement this system into my small  self designed ad server. So each request would mean a shufflling of all the records and returning the weighted values. If you experts can suggest me any sort of algorithm that can select some particular set of data consisting of weighted values + random values that can be inserted into a temporaray table and shuffle the sets every one minute , this way php will be shuffling definite sets of data and not the records of whole database. I am bit nervous here to explain what is going in my mind and cant put into words , so putting a block diagram below.




Mysql --> Get all the Records(php) --> Shuffle and get x limit weighted results(php) --> Insert into a temporary table(mysql)  ---> Shuffle the record set after some minute with dynamic shuffling and not selecting identical sets everytime. (php)


This is what i am planning and the gapped and filterd results have worsened the condition .
0
Steve BinkCommented:
In that context, you can set up stored procedures to query your main ad tables and create a "weighted" table for you.  You would call this procedure once a day or on whatever schedule you need.  All the procedure would do is the temporary table from option 1, but it would not be temporary.  Your normal query to pull ads would still use the randomized ORDER BY/LIMIT, but on this other surrogate table.  When you need to modify your ad circulation, simply edit the original table to change weights or add/remove new ads.

Basically, if you know you are going to use a particular set of weighted ads for a length of time, do the weighting at the beginning of that timeframe and use its results the rest of the time.
0
Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

pvsbandiAuthor Commented:
ok buddy , its getting closer to achieve what i am aiming for but let me clear my doubt before i reach a definite point. Upto the weighted thing and i am convinced  but do you think its a good idea to fetch all the values and shuffle in php  or its a good idea to fetch the weighted values in some order ?. if i would have to fetch some weighted values from al the qualified values which system i go for that it picks up different sets of data everytime with some limit say 5 or 10 and then shuffle in php according to weight. This way i will have less load and i dont want to use order by RAND as currently i am ok with RAND but i am not going to hold it  when i have the table increasing day by day , not in a great number though but its increasing +20 records everyday . So please suggest a system that picks up some sets of data dynamically with a limit of x number everytime which shuffles accordingly so that the lowest weighted value also comes up and not all the ascending or descending weighted values serially.
0
pvsbandiAuthor Commented:
http://adsalvo.com/xvtrqd/ads.php?uid=1     for reference check this sorting , this is what i am trying to achieve , it shuffles  those ads based on user points , those who have more points shows more frequent while keeping a dynamic environment by not picking most weighted values each and everytime and all have a respective weighted chance to get selected. He too have a table that has many records and only the qualified records gets shuffled with some sort of rule, I cannot understand what is it, i will be glad if you observer his system.
0
Steve BinkCommented:
To be clear, I am not sure it is statistically equivalent to a real weighted system, but it should just fine for your purposes.  The reason I say use ORDER BY RAND() is because it fits the bill.  The alternative is to select the every row and let PHP create and shuffle the array, which I think would take significantly more resources.  BUT, the RAND() method does come with its own problem - you are basically setting a query on every row to get the RAND() before the LIMIT takes effect.  Absent any other WHERE clauses to reduce your return, you *will* be querying the entire table.  They key here is exactly how many records you expect to have.

You say this is for an ad system.  Being liberal, let's say you have 1000 different ads meeting whatever criteria for display (i.e., post-WHERE).  From anecdotal accounts, this strategy begins to really show its deficiencies once you are up into the couple-thousand-row range, so 1000 should still be the blink of an eye.  Benchmarking this should be relatively easy.  First, create a new, disposable table.  For simplicity's sake, just use three fields: id (PK,unique index), descript (varchar(20) should do), and active (int).  Load 1000 rows of sample `descript`.  For each 100 rows, assign a new `active` value - the first 100 get 1, next 100 get 2, and so on.  Now see the code below.  That's the rough equivalent of what I ran.  I started with 1000 rows, but there was no real difference at 2 decimal places, so I kept adding more.  I went up to 15000 rows (evenly distributed for `active`), which looked like the table at the bottom of the code.  Realistically, still not much difference, though it would no doubt have more impact with a couple thousands hits per hour.  I was surprised by the results, but there you go - ORDER BY RAND() wins 2- or 3-to-1.
<?
$query = "SELECT descript from NewTable WHERE Active<";
$times=array(0=>array(),1=>array(),2=>array());

// benchmark RAND()
$times[0][]=microtime(true);
for($x=2;$x<11;$x++){
  $result=mysql_query("$query{$x} ORDER BY RAND() LIMIT 10");
  $times[0][]=microtime(true);
}

// benchmark shuffle()/array_rand()
$times[1][]=microtime(true);
for($x=2;$x<11;$x++){
  $result=mysql_query("$query{$x}");
 //read result into array
 $ret=array();
 while ($ret[]=mysql_fetch_assoc($result)) { }
 shuffle($ret);
 $ret2=array_slice($ret,0,10);
 $times[1][]=microtime(true);
}

$times[2][]=microtime(true);
for($x=2;$x<11;$x++){
  $result=mysql_query("$query{$x}");
 //read result into array
 $ret=array();
 while ($ret[]=mysql_fetch_assoc($result)) { }
 $ret2=array_rand($ret,10);
 $times[2][]=microtime(true);
}

echo "<table>",
     "<tr><td>&nbsp;</td><td>RAND()</td><td>shuffle()</td><td>array_rand()</td></tr>";
for($x=0;$x<count($times[0]);$x++) {
  echo "<tr><td>$x</td><td>",$times[0][$x],"</td><td>",$times[1][$x],"</td><td>",$times[2][$x],"</td></tr>";
}
echo "<tr><td>&nbsp;</td><td>",
      ($times[0][9]-$times[0][0]),
      "</td><td>",
      ($times[1][9]-$times[1][0]),
      "</td><td>",
      ($times[2][9]-$times[2][0]),
      "</td></tr>";
echo "</table>";
}

/*
 	RAND()	        shuffle()       array_rand()
0	1271335143.41	1271335143.5	1271335143.78
1	1271335143.41	1271335143.51	1271335143.81
2	1271335143.42	1271335143.52	1271335143.81
3	1271335143.43	1271335143.54	1271335143.83
4	1271335143.44	1271335143.57	1271335143.84
5	1271335143.45	1271335143.6	1271335143.86
6	1271335143.46	1271335143.64	1271335143.89
7	1271335143.47	1271335143.68	1271335143.92
8	1271335143.48	1271335143.73	1271335143.95
9	1271335143.5	1271335143.78	1271335143.98
 	0.089212179184	0.285773038864	0.200662136078
*/
?>

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.