Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Data Structure and approach for recommendations system

Posted on 2015-01-26
2
Medium Priority
?
201 Views
Last Modified: 2015-01-27
Hi,

I am working on a simple recommendations system. Conceptually the bare bones structure (using relational database terms because thats what I am familiar with) is:

fact_table:
  score
  recommender_id
  item_id

I then want to, for a given recommender_id rank all other recommender_ids by their similarity (e.g. if recommender_id 1 scored item_id 1 as 10 and item_id 100 as 0 then another recommender ranking 1 as 9 and 100 as 1 will rank higher than a recommender scoring 10 for 1 and 8 for 100).

I expect this to be a quite large sparse matrix, possibly tens of thousands on each axis.

I would like to get the approach right from the start and would like recommendations for how to best structure the data and which language / tools to use.
0
Comment
Question by:monoceros
2 Comments
 
LVL 28

Accepted Solution

by:
dpearson earned 2000 total points
ID: 40572125
I think language is pretty much wide open.  I'll suggest some specifics using Java, but all modern languages have the lists and  maps I'd recommend.

// I think you want some way to store (score+recommender) info together
public class ScoreInfo {
    public final int recommender_id ;
    public final int score ;
    public ScoreInfo(int recommender, int score) {
        this.recommender_id = recommender ;
        this.score = score ;
    }
}

Then I think you can read your entire fact_table into a datastructure like this:

Map<ItemID, List<ScoreInfo>> scoreMap = new HashMap<ItemID, List<ScoreInfo>>() ;

which if you don't speak Java is a map (a hashtable).
The keys in the hashtable are ItemIDs - perhaps integers - you didn't define any types.
The values in the hashtable are List<ScoreInfo> - lists of (score+recommender) pairs.

I hope it's clear that you can read the entire fact_table from a database and build a map like this with no duplication - each entry in the database would have a single entry in the map.  You do this one time - before you start computing similarities.

Then given a specific recommender which you are trying to find similarities to the algorithm becomes:
a) Get the list of (score+item) pairs for this recommender (SELECT score, item_id FROM fact_table WHERE recommender_id=?)
b) Go through each item in that list
c) For each item - retrieve the List<ScoreInfo> from the map.  This is the list of all recommenders that included that item id in their recommendations.  This lookup is constant time for each item so very efficient (since it's a hashtable).
d) Compute the similarity score for this item (for all recommenders that include that item) by walking this list of ScoreInfo objects.  Record this in another map (hashtable) like Map<RecommenderID, Similarity>.  if there's already an entry in the map, you add to the similarity score (if no entry, you just set an initial similarity score).
e) At the end of going through each item (from step c) you have a Map<RecommenderID, Similarity>
f) Now just sort the entries in that final map by similarity and you have the values you need.

The time to build the original map is O(n) where n is the number of entries in the database.  So no wasted time/space.
The time to compute the recommendations for any specific recommender is
        proportional to the (number of items in the original recommender) * (the number of other recommenders for exactly those items)

so you're only touching the sparse entries in the matrix - you're not scanning empty space.

The sorting at the end is O(n log n) where n is the number of other recommenders that overlap with the list of items in the original recommender.

I hope that helps,

Doug
0
 
LVL 1

Author Comment

by:monoceros
ID: 40572216
Hey, Doug,
Thanks - thats eloquent and looks straightforward (and happens to be in Java, which I do know)! For some reason I was blinkered with a view that "this is math, its big(ish) data - therefore I need one of the fancier newer languages" (partly driven by a desire to learn them!) but this is clean and can get me to a MVP without any steep language learning curve.
Cheers,
Mono.
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Q&A with Course Creator, Mark Lassoff, on the importance of HTML5 in the career of a modern-day developer.
The SignAloud Glove is capable of translating American Sign Language signs into text and audio.
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question