Link to home
Start Free TrialLog in
Avatar of zzynx
zzynxFlag for Belgium

asked on

How to honestly grant points to entries of music charts containing 20, 30, 40 ànd 50 tracks?

Hi,

Given all weekly music charts (of the most popular music tracks in my country) from 1966 till today,
I want to write a program that is able to generate all kind of reports out of that music chart data.
Eg. The list of the xx most popular hits seen over a certain (and why not, the complete) period.

To find the most popular songs, I give each song points based on its position in the charts.

Eg. # points = (# of items in the chart - position)
That way, in a top 50, the track on the first position gets 50 points, while the last track gets 1 point.

However, I have a problem with that.
I discovered that in the sixties I only have charts of 20 tracks.
In May 1970, a switch was made from a top 20 to a top 30.
In the 80's to went to 40 tracks and later on they switched once again to a chart containing 50 tracks.
So, for the period 1966 till today I have weekly charts containing 20, 30, 40 and 50 tracks.

It's clear that with the points system I explained above, the older music can never beat the newer music,
since
1) a top position in 1966 generates 20 points, while a top track of today gets 50 points
2) in 1966 a track had only 20 positions to score points (number of weeks in the chart), while today that is 50 positions.

Question: Can anyone think of a way to give points to music tracks in a way that eleminates the above mentioned disadvantages?


PS.
I know that one solution is to only consider the top 20 tracks of all charts.
But I wonder if it's possible to use all information AND yet keep it "honest".
Avatar of Uros Gaber
Uros Gaber

Why not add a special factor to the top20 and top30 charts.
Avatar of zzynx

ASKER

Could you explain your idea in more detail, please?
Maybe with an example?
Or maybe a simple lookup table, i.e.
you could simply say that for top20:
1. position == 50
2. position =~ 47.36842
3. position =~ 44.73684
4. position =~ 42.10526
5. position =~ 39.47368
6. position =~ 36.8421
7. position =~ 34.21052
8. position =~31.57894
9. position =~28.94736
10. position =~26.31578
11. position =~23.6842
12. position =~21.05262
13. position =~18.42104
14. position =~15.78946
15. position =~13.15788
16. position =~10.5263
17. position =~7.89472
18. position =~5.26314
19. position =~2.63156
20. position =~ 0

=~ means approx. as the 20. position actually with this table you get -0.000002.
Avatar of zzynx

ASKER

Already thought something like that, however:

                         Points in top 50              Points in top 20
1. position                  50                                       50
2. position                  49                                       47.5
3. position                  48                                       45
...
19. position                 32                                        5
20. position                 31                                        2.5
21. position                 30
...
49. position                  2
50. position                  1

Total points =             1275                                       525

Open in new window


I feel like the total points to earn should be the same for each chart, which isn't the case...
Or is that condition wrong?
If you want to approach it (semi) scientifically I see several options:

1) you say the oldies must have a chance to beat the newer music, why not try giving the #1 100 points, #2 99 and so on for any top x (so in a top 20 the last has 81). Choose an incredibly popular number from every decade (like the longest time on #1, if you have everything in a database that should be easy to determine) and see if they make it to the overall top 10 for example. See what the effect is of starting at 50 or at 500.

2) probably not an option but if you have actual sales figures instead of just top x it should be more honest to use those and make it relative to total sales otherwise new figures are still much higher as they may include downloads. It might take it too far but there could be national statistics available for overall record and CD sales/spendings, that could be a way to go, but that's a lot of if's.

3) for every type of chart (20/30/40/50) determine the average time a single stays in the chart (or per year). From this, determine a 'handicap' to score extra points that each single gets so when they were in a top 20 they get extra points for the time they would have spent in the bottom part of the list (on average)

4) variant on the above, maybe easier and more accurate: for all the top 50's, determine the points a single would have missed (on average) if it had only been a top 40. This is the 'handicap' described above so you can correct all totals for the top 40 period with that figure. Then some more thought has to go into using these new numbers or the originals to use top 40 scores to correct top 30 scores and finally top 20 scores. Or you could only use top 50 scores to determine the handicap for all older charts.

Afterthought: maybe the handicap should be a function of number of weeks in the top and/or max position...
Avatar of zzynx

ASKER

Hi Robert,
First of all, thanks for your input.

Option 1)
I don't think this can work, since 60's tracks and recent tracks get equal points for the same position, but 60's tracks only get points for 20 positions, while recent tracks get points for 50 positions. So the 60's tracks won't make it.
That's also what I saw if I applied this approach: the top 100 (seen over all years = from '66 till today) didn't contain any track from '60, '70 or even '80. Which prooves for me that's not the way to go.

Option 2)
That would probably result in a more accurate overall top list.
Unfortunately, I don't have actual sales figures.
If anyone knows of any website where I could find those, please let me know.

Option [3) and] 4)
So you would give
- each track in a top 50 the # points according to its position (= 51-position)
- each track in a top 40 the # points according to its position (= 41-position) + extra points p(40)
- each track in a top 30 the # points according to its position (= 31-position) + extra points p(30)
- each track in a top 20 the # points according to its position (= 21-position) + extra points p(20)

I think the fact that p(x) is just an average is not completely "honest".
And indeed making p() also dependent of #weeks in the top ànd max position would be necessary.
But how to define that function?


This is what I came up with, so far (based on the previous idea):

Points for an entry in the top 50 = (51 - position)
Points for an entry in the top 40 = (41 - position) * 5/4 * 1275/1025
Points for an entry in the top 30 = (31 - position) * 5/3 * 1275/775
Points for an entry in the top 20 = (21 - position) * 5/2 * 1275/525

Which gives for each top X a total number of points to gain = 1275

How did I come to the values 525, 775 and 1025?

Sum of the numbers [1..20] = (21*20)/2 = 210
Sum of the numbers [1..30] = (31*30)/2 = 465
Sum of the numbers [1..40] = (41*40)/2 = 820
Sum of the numbers [1..50] = (51*50)/2 = 1275

Sum of the numbers [1..20] * 5/2 = 525
Sum of the numbers [1..30] * 5/3 = 775
Sum of the numbers [1..40] * 5/4 = 1025

So in a table (if that comes out well - something I really miss here at EE: being able to create tables):

[Position, points in top 50,  points in top 40, points in top 30, points in top 20]
                        
1      50      62,20      82,26      121,43
2      49      60,64      79,52      115,36
3      48      59,09      76,77      109,29
4      47      57,53      74,03      103,21
5      46      55,98      71,29      97,14
6      45      54,42      68,55      91,07
7      44      52,87      65,81      85,00
8      43      51,31      63,06      78,93
9      42      49,76      60,32      72,86
10      41      48,20      57,58      66,79
11      40      46,65      54,84      60,71
12      39      45,09      52,10      54,64
13      38      43,54      49,35      48,57
14      37      41,98      46,61      42,50
15      36      40,43      43,87      36,43
16      35      38,87      41,13      30,36
17      34      37,32      38,39      24,29
18      33      35,76      35,65      18,21
19      32      34,21      32,90      12,14
20      31      32,65      30,16      6,07
21      30      31,10      27,42      
22      29      29,54      24,68      
23      28      27,99      21,94      
24      27      26,43      19,19      
25      26      24,88      16,45      
26      25      23,32      13,71      
27      24      21,77      10,97      
28      23      20,21      8,23      
29      22      18,66      5,48      
30      21      17,10      2,74      
31      20      15,55            
32      19      13,99            
33      18      12,44            
34      17      10,88            
35      16      9,33            
36      15      7,77            
37      14      6,22            
38      13      4,66            
39      12      3,11            
40      11      1,55            
41      10                  
42      9                  
43      8                  
44      7                  
45      6                  
46      5                  
47      4                  
48      3                  
49      2                  
50      1                  
                        
Total      1275      1275      1275      1275


If I apply this to my data I get the corrected top 100.
The results are this:
In the first column the position in the corrected top 100
In the second column the position in the uncorrected top 100 (only taken the top 20 into account)
In the third column (between square brackets []) the position in the top 100 of the sixties (From 1966-Jan-01 to 1969-Dec-31)

1                                    2
2                                    1
3                                    22      [1]
4                                    3
5                                    76      [2]
6                                    4
7                                    -      [3]
8                                    -      [4]
9                                    -      [5]
10                                    79      [6]
11                                    -      [7]
12                                    -      [18]
13                                    -      [8]
14                                    67      [9]
15                                    -      [10]
16                                    -      [11]
17                                    10

...                                    ...
34                                    6
36                                    5
41                                    7
58                                    8
23                                    9

As you can see, the top 4 is rather stable.
And the 60's top 10 is nicely represented at the top too.
However, I'm wondering why the first track of the 70's is only at position 35.

What do you think of this result?
Well that looks pretty impressive already. I must admit it's hard to say anything really about 'anomalies' without some more info which I'm not sure you can share without "giving it all away".

For me the key would be coming up with that function "p(x)" but I would really need some sample data to avoid working with data that doesn't match your structure.
Avatar of zzynx

ASKER

Well that looks pretty impressive already.

Although I have my doubts about the correctness of the 70's hits

I must admit it's hard to say anything really about 'anomalies' without some more info which I'm not sure you can share without "giving it all away".
What exactly do you mean?
All the information can be found online.
I just wrote a program that webscraped the information from a website.
Do you want the (csv) files? Let me know and I'll share it with you.
It's a zip of 2MB.
Each file contains 20, 30, 40 or 50 lines.
And each line contains
<Artist name>,<Track name>,<position>,<position previous week>,<number of weeks>,<date of first entry if available>

Open in new window

Ah, I see. Well although it is hard to avoid sometimes, not all information on the web can be freely redistributed. But I think it would help to have a small selection, maybe a year for each type? If you feel the scraped data is yours to distribute then by all means upload the whole file (if it is too big you can use ee-stuff.com) but maybe at least an attribution is appropriate?
Avatar of zzynx

ASKER

maybe at least an attribution is appropriate
Here you have. (1967, 1978, 1985, 1995 & 2009)

(Uploading as attachment didn't work. Error: The archive could not be scanned: Stream closed)
Avatar of zzynx

ASKER

Is there something wrong with the way of working I explained above, since:

Position in the top 100 of the 70's (1970-1979) vs. Position in the "all time" top 100:
1 - not in
2 - 61
3 - 93
4 - 62
5 - 76
6 - 35
7 - not in
8 - 53
9 - 85
10- not in

Where the top 10 of the 60's was (maybe too?) nicely represented in the over all top 100, the top 10 of the 70's isn't.
It's difficult to believe that three tracks of the 70's top 10 (including no. 1) didn't make it into the overall top 100 list.

On the other side however, this track (the nr. 1 of the 70's top 100) was 25 weeks in the top (30) on these positions:
28, 21, 16, 11, 5, 4, 6, 6, 5, 4, 5, 6, 6, 9, 10, 10, 13, 13, 20, 20, 18, 24, 26, 26, 29.
If we compare that to the highest 70's track (on pos 35) in the overall top: it was 21 weeks in the top (30) on these positions:
12, 5, 5, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 6, 11, 12, 12, 18, 23
It is clear that the latter one, was more popular.

But then, how come that it is only on position 6 of the 70's top 100, while the other is on position 1?
*Confused*
Well if you want higher positions to count more than a longer stay, you should adjust the score for example "101 - 2 * position" for starters.

Oh by the way, when you say "the 70's top 100" do you mean one you generate with the previous algorithm or an actual one done by a radio station or website?
SOLUTION
Avatar of Robert Schutt
Robert Schutt
Flag of Netherlands image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of zzynx

ASKER

Oh by the way, when you say "the 70's top 100" do you mean one you generate with the previous algorithm or an actual one done by a radio station or website?
The one generated by myself for the period 1-Jan-1970 to 31-Dec-1979
(btw, on 02-May-1970 the list of tracks was extended from 20 to 30)

That's why it confuses me, that
the no. 1 of my 70's top 100 is not present in my all-time top 100, while
the no. 35 of my all-time top 100 is (only) no. 6 of my 70's top 100...

Well if you want higher positions to count more than a longer stay, you should adjust the score for example "101 - 2 * position" for starters.
My current method of assigning points looks ok in that regard, since the one with 21 weeks (but 9 weeks no. 1) made it into my over-all top 100, while the one of 25 weeks (with lower positions) not.
But when I look at my 70's top 100, the one of 25 weeks is the number one, while the other one is only number six.

It's confusing since the same method of assigning points, seems to produce different results. (over-all versus 70's)
Avatar of zzynx

ASKER

First of all, thanks again for trying to help.
Much appreciated to have someone to share thoughts with!

When I add both adjustments to the top 20/30/40 songs, still 7 songs of the top 10 are from recent years.
Mmm... doesn't look "equilibrated" indeed.

Looking at the ultratop website it looks like they're dealing with a certain formula they keep a secret or maybe they just want it to appear that way.
What makes you think that?
(btw, that's the website the info comes from, but you probably already guessed that)

how to determine if the combined list will ever be honest without relating it to some objective external factor (like sales).
Admitted, that won't be easy...

But I believe it should at least be possible to get closer to a solution just using the bare numbers.
That's what I hope.

I'm thinking along the lines of calculating the overall scores for all top 20's etc
Sounds reasonable at first sight.
Maybe the number of different tracks that lead to that total is also important?
I haven't forgotten about this, but it's on the back-burner for a bit. Will get back to you later this week.
Avatar of zzynx

ASKER

Sure. No problem. No pressure.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of zzynx

ASKER

A remark about the graph:
Each position is shown with the score it got.
So x = position and y = score?
Then, how come that eg. the max x for top 20 is 180? And the max x for a top 30 is 230?
I don't understand.

I have also looked at a simpler way and it shows promise!
That sounds as if it could work out right.

I'm not sure if I understand what you want to say in your last paragraph...
That's data from the zip you posted with a limited number of years. Here it is a bit more clearly hopefully:
User generated image
That last paragraph tries to explain why I think a multiplier instead of linear shift (or even some function) is better in the sense that it will not have the problem where top numbers from the 60s show in a different order in the overall top chart.
Avatar of zzynx

ASKER

Just a post to let the EE admin know that I don't consider this question as abandoned.
We're thinking... And we're taking our time to do so ;-)
Well I finally found the time (and mindset) to put a new calculation in my program that multiplies each score in top 20s, 30s and 40s based on the highest position reached (compared to top 50 songs reaching the same position). Unfortunately (probably to be expected) it doesn't help much because top 50 songs that reach a high position also spend more weeks in the list so that does indeed need to be factored in as discussed earlier.

I also noted another problem: a few old songs got very high but that's because I multiplied the total score with the 20s multiplier and they had a comeback in later years which should be separately multiplied so there's a bit more work in it but I think I'm on the right track at last.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of zzynx

ASKER

it doesn't help much because top 50 songs that reach a high position also spend more weeks in the list so that does indeed need to be factored in as discussed earlier.
I see.


I also noted another problem: a few old songs got very high but that's because I multiplied the total score with the 20s multiplier and they had a comeback in later years which should be separately multiplied so there's a bit more work in it but I think I'm on the right track at last.
Really. Same song and same artist? A re-release in other words.
That's an extra difficulty I wasn't aware of.
So, a song that was introduced in a top x and reappears in a top y should be treated in some special way. Well, well, well, the more you dig in, the more you find extra difficulties...

scoring adjustment based on number of weeks in the chart but now the top 20 songs end up taking over the top.
Same problem I saw with my first, simple approach.

Thanks again for your thoughts and time.
It seems an endless story and I'm sorry for that, it's again a question of not simply making time for this but more importantly getting in the right state of mind and getting anchored in with an idea to apply some mathematical formula (doesn't even have to be complex but still...) and then trying to present results in a way that can convince me (and ultimately you of course) that the results are getting nearer an honest distribution than anything so far. Long story short, I'm not there yet but will be able to apply some more time to this soon.
Avatar of zzynx

ASKER

Don't feel sorry, Robert.
You owe me nothing. :-)

Nevertheless,
I'm not there yet but will be able to apply some more time to this soon.
That's great to hear.
And thank you.

Btw, Most of the time, I also don't have/take the time or the right mood to think and rethink again about the problem. So, no real progress from my side neither. (If even possible...?)
Just to keep you up to speed: what I'm doing at the moment is building a new form in my little test application where I can move some sliders to get a feel for the range of the multipliers needed to get a better mix at the top. Of course the downside of solving it like this is that I don't really know what the criteria are to call it 'honest' because it's just a visual determination. Anyway, I hope to post some intermediate results soon!
Avatar of zzynx

ASKER

... what the criteria are to call it 'honest'
That's indeed a problem. I guess
"a more or less equal representation of each decade in the top 100"
is the best criterion I can come up with.

Remark: since the information we have only starts at 1966 and we're 2016 now,
"equal" should probably be interpreted as

60's: ~10 tracks
70's: ~20 tracks
80's: ~20 tracks
90's: ~20 tracks
00's: ~20 tracks
10's: ~10 tracks
Well, time has gotten me in a crunch again. I had to put other tasks first but try to keep this question in the back of my mind at all times and hope to come up with a brainwave at some point ;-)
Avatar of zzynx

ASKER

Take your time. It's not a FWP  :P
I know but still I was expecting to come up with some plan by now. I actually left my previous plan behind for a while for something I thought might get us somewhere better but was wrong... So I'm back on trying to make a 'playground' tool.
Avatar of zzynx

ASKER

Well, thanks for the update.
I must admit that I'm pretty out of ideas...
Whatever I tried the last couple of months, it all quickly led to "wrong" results  :(
Avatar of zzynx

ASKER

Thanks for thinking along, Robert.
I'm going to close this question for now.