This blog post is authored by Thore Graepel, Principal Researcher at Microsoft.
Good recommendations are needed everywhere. Whether you are looking for a movie you might enjoy watching or a book that you might enjoy reading or even suggestions for people with similar interests who you could connect with on Facebook or LinkedIn, automatic recommender systems are the solution.
Whereas such systems were previously available primarily to the largest online players, with the upcoming release of Microsoft Azure ML we will have a recommender system available to a much wider audience of individuals and businesses who can use it for the benefit of their own customers.
Typically, there are two types of entities involved in a recommender system (RS), let’s call them users and items. Users are the people to whom you would like to make recommendations. Items are the things you would like to recommend to them such as movies, books, web pages, recipes, or even other people.
Suppose we would like to recommend, say, a restaurant to a given user based on the 5-star ratings that this user and other users have provided for some of the restaurants in your universe. We can break down the recommendation task into two steps:
But how can we predict how that particular user would rate all of those restaurants he has not actually rated? This is where machine learning (ML) comes into play.
In order to build an ML model that can predict, for a given user/item combination, how the user would rate the item, we need to collect data of the form (userID, itemID, rating). You can think of this as a large matrix, users as rows, items as columns and entries as ratings.
This will be a sparse matrix (with many missing entries) because typical users will only rate a small subset of items. The Bayesian RS implemented in Azure ML takes this training data, trains a model, and essentially returns a function that predicts for a given user/item pair how the user would rate that item. These ratings are not restricted to 5-star ratings. Other signals such as purchase, clicks, or time-spent can be equally if not more informative for making good recommendations.
So how does this work? The RS learns an embedding of users and items into what we call a latent trait space (see image below). A User (blue dot) rates an Item (red dot) positively if their vectors are aligned with the item vector, and negatively if their vectors point in opposite directions. Similar users and similar items will be placed closely together in trait space, thus making it possible to infer ratings even for user/item combinations for which no ratings are available from the training data. While the image below shows a two-dimensional trait space for illustration purposes, we use 20 to 100 dimensions in our deployed systems. Sometimes, we can even find interpretable axes ("traits") in trait space. For example, below, the North-South trait could be "grown-ups" vs "kids", and the West-East trait could be "mainstream" vs "cult".
One key problem for an RS is cold-start. New users may not have rated enough items, and new items may not have been rated by enough users to make good predictions. To mitigate this problem, the Azure ML RS makes it possible to represent users and items not just by their ID, but by a feature vector constructed from meta-data. For users, this may include any profile information such as age or geo-location, for items such as movies it may include information such as genre, actors, director, year of production etc. As a consequence, the system can generalize across users and items by making use of common attributes in the meta data.
If you are curious about the mathematical underpinnings of recommender systems, take a look at the paper Matchbox: Large Scale Bayesian Recommendations.
If you are eager to build your very own recommender system pipeline, try your hand at doing so using Azure ML once it becomes available (very soon). Azure ML Studio, shown in the picture below, has a recommender system module and a powerful browser-based graphical user interface including drag/drop capabilities, making your task relatively easy.
In fact, the Azure ML recommender system combines two of the most powerful paradigms for predicting ratings – content-based filtering and collaborative filtering – and, by making this widely available, our hope is that it will result in a much broader use of automatic recommendation systems and in many more cool scenarios that will benefit customers everywhere.
Thore Graepel Learn more about my research. Follow me on Twitter.