Solved

Popularity algorithm

Posted on 2004-08-26
8
417 Views
Last Modified: 2008-03-03
Hi all,

I have created a small web page for my benefit that will list shortcuts to my favourite websites.  So far so good!  It has a database backend which stores various data including the date and time each time I access a site listed on the page.  The reason for this is that I want to alter the order of the links in each category so that my most popular links appear higher up the list.  Of course I could just count the number of times I have accessed each site but this has two flaws that I can see (perhaps more!):

1.  If I add a new site that I access a lot, it would still appear lower down than some more established, less-popular sites

2. A certain site may become less popular over time but remain nearer the top of the list despite me not accessing so often.

Obviously I could simply count the number of times the sites are accessed in a given period, e.g. over the last 4 weeks, but I was wondering if there is a more efficient algorithm?  I am sure a similar algorithm must exist as this could be used in a wide variety of circumstances, e.g. most played songs in a personal juke-box.

So if you know of any algorithm that solves this, and perhaps gives added weight to more recent 'accesses', and if there are any other issues I should consider, please let me know,

Cheers,

Nev
0
Comment
Question by:nevillestyles
  • 3
  • 3
  • 2
8 Comments
 
LVL 27

Expert Comment

by:d-glitch
ID: 11903317
You could increment the popularity score by one every time you visit a site.

And then let the score decay over time:  Once a day (or once  a week) multiply all the scores by 0.95 (or some other fraction).

That would give you a weighted average with a memory of 20 days (or 20 weeks).      0.95  =  1 - 1/20
0
 

Author Comment

by:nevillestyles
ID: 11903550
Thanks for that, it makes a lot of sense.  But can I run an example by you.  There are 2 sites initially, namely S1 and S2.  After one day their scores are as follows:

S1 - 20
S2 - 40

They are multiplied by 0.95 to get:
S1 - 19
S2 - 38

This continues but the scores are still cumulative so although this seems fair for sites that are added at the same time, it does not allow for newly added sites.  

This is certainly along the right lines but I think it needs a little modification.

Cheers,

Nev
0
 
LVL 27

Accepted Solution

by:
d-glitch earned 75 total points
ID: 11903796
You have to let things develop over a longer period of time.

Say the initial scores for sites (A,  B,  C)  on Day_1 are  (20, 10, 0)

You never visit A again.  You isit B once  a day.  You visit C twice a day.

The scores will evolve like this.  It takes 8 days for the algorithm to realize that A isn't your favorite any more.
You can make it faster or slower by changeing the .95 up or down.

Day      A      B      C

1      20      10      0
2      19.00      10.50      2.00
3      18.05      10.98      3.90
4      17.15      11.43      5.71
5      16.29      11.85      7.42
6      15.48      12.26      9.05
7      14.70      12.65      10.60
8      13.97      13.02      12.07
9      13.27      13.37      13.46
10      12.60      13.70      14.79
11      11.97      14.01      16.05
12      11.38      14.31      17.25
13      10.81      14.60      18.39
14      10.27      14.87      19.47
15      9.75      15.12      20.49
16      9.27      15.37      21.47
17      8.80      15.60      22.39
18      8.36      15.82      23.28
19      7.94      16.03      24.11
20      7.55      16.23      24.91
21      7.17      16.42      25.66
22      6.81      16.59      26.38
23      6.47      16.76      27.06
24      6.15      16.93      27.71
25      5.84      17.08      28.32
26      5.55      17.23      28.90
27      5.27      17.36      29.46
28      5.01      17.50      29.99
29      4.76      17.62      30.49
30      4.52      17.74      30.96
0
 

Author Comment

by:nevillestyles
ID: 11903917
Thanks for that, I was being lazy by not going further with my example!  That solution will work a treat!
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 13

Expert Comment

by:SteH
ID: 11904208
Already finished but got another solution you could consider:

Add not 1 for each visit but
exp (- (days since visit)).
The exponent will be ~1 for recently visited sites and going towards 0 for visits long ago.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 11904728
SteH's solution is pretty good.       For one thing it updates automatically.  So you don't have to do the daily decay adjustment.

                                                 It also saves information long term.  With my solution all the sites you stop visiting eventually
                                                 windup tied at zero.

                                                 Try simulating the algorithm in Excel.  Some combination of the two might work out best.


                                           
                                           

0
 

Author Comment

by:nevillestyles
ID: 11912096
Thanks to you both, I have now got a good system that seems to work a treat.  

Is it possible to give SteH some points for your contribution?
0
 
LVL 13

Expert Comment

by:SteH
ID: 11912158
You can post a new question "points for SteH" were you refer to this one. Only requirement of EE for this is that the points for this Q and the new Q added are 500 or less.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Excel Stat for Dummies - What's the difference between STD.P and STD.S 4 56
Rounding Values 11 55
wordsWithoutList  challenge 24 90
x2 – x2 = x2 – x2 conundrum 12 66
Introduction On a scale of 1 to 10, how would you rate our Product? Many of us have answered that question time and time again. But only a few of us have had the pleasure of receiving a stack of the filled out surveys and being asked to do somethi…
This article seeks to propel the full implementation of geothermal power plants in Mexico as a renewable energy source.
This is a video describing the growing solar energy use in Utah. This is a topic that greatly interests me and so I decided to produce a video about it.
With Secure Portal Encryption, the recipient is sent a link to their email address directing them to the email laundry delivery page. From there, the recipient will be required to enter a user name and password to enter the page. Once the recipient …

932 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now