Data Structure and approach for recommendations system

Posted on 2015-01-26
Last Modified: 2015-01-27

I am working on a simple recommendations system. Conceptually the bare bones structure (using relational database terms because thats what I am familiar with) is:


I then want to, for a given recommender_id rank all other recommender_ids by their similarity (e.g. if recommender_id 1 scored item_id 1 as 10 and item_id 100 as 0 then another recommender ranking 1 as 9 and 100 as 1 will rank higher than a recommender scoring 10 for 1 and 8 for 100).

I expect this to be a quite large sparse matrix, possibly tens of thousands on each axis.

I would like to get the approach right from the start and would like recommendations for how to best structure the data and which language / tools to use.
Question by:monoceros
LVL 27

Accepted Solution

dpearson earned 500 total points
ID: 40572125
I think language is pretty much wide open.  I'll suggest some specifics using Java, but all modern languages have the lists and  maps I'd recommend.

// I think you want some way to store (score+recommender) info together
public class ScoreInfo {
    public final int recommender_id ;
    public final int score ;
    public ScoreInfo(int recommender, int score) {
        this.recommender_id = recommender ;
        this.score = score ;

Then I think you can read your entire fact_table into a datastructure like this:

Map<ItemID, List<ScoreInfo>> scoreMap = new HashMap<ItemID, List<ScoreInfo>>() ;

which if you don't speak Java is a map (a hashtable).
The keys in the hashtable are ItemIDs - perhaps integers - you didn't define any types.
The values in the hashtable are List<ScoreInfo> - lists of (score+recommender) pairs.

I hope it's clear that you can read the entire fact_table from a database and build a map like this with no duplication - each entry in the database would have a single entry in the map.  You do this one time - before you start computing similarities.

Then given a specific recommender which you are trying to find similarities to the algorithm becomes:
a) Get the list of (score+item) pairs for this recommender (SELECT score, item_id FROM fact_table WHERE recommender_id=?)
b) Go through each item in that list
c) For each item - retrieve the List<ScoreInfo> from the map.  This is the list of all recommenders that included that item id in their recommendations.  This lookup is constant time for each item so very efficient (since it's a hashtable).
d) Compute the similarity score for this item (for all recommenders that include that item) by walking this list of ScoreInfo objects.  Record this in another map (hashtable) like Map<RecommenderID, Similarity>.  if there's already an entry in the map, you add to the similarity score (if no entry, you just set an initial similarity score).
e) At the end of going through each item (from step c) you have a Map<RecommenderID, Similarity>
f) Now just sort the entries in that final map by similarity and you have the values you need.

The time to build the original map is O(n) where n is the number of entries in the database.  So no wasted time/space.
The time to compute the recommendations for any specific recommender is
        proportional to the (number of items in the original recommender) * (the number of other recommenders for exactly those items)

so you're only touching the sparse entries in the matrix - you're not scanning empty space.

The sorting at the end is O(n log n) where n is the number of other recommenders that overlap with the list of items in the original recommender.

I hope that helps,


Author Comment

ID: 40572216
Hey, Doug,
Thanks - thats eloquent and looks straightforward (and happens to be in Java, which I do know)! For some reason I was blinkered with a view that "this is math, its big(ish) data - therefore I need one of the fancier newer languages" (partly driven by a desire to learn them!) but this is clean and can get me to a MVP without any steep language learning curve.

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Computer science students often experience many of the same frustrations when going through their engineering courses. This article presents seven tips I found useful when completing a bachelors and masters degree in computing which I believe may he…
When there is a disconnect between the intentions of their creator and the recipient, when algorithms go awry, they can have disastrous consequences.
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below.…

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question