Slope-one Recommender - Exclusive Article from Mahout in Action
This article is taken from Mahout in Action by Sean Owen, Robin Anil, Ted Dunning, and Ellen Friedman. The authors discuss the slope-one recommender approach to recommendation based on differences in preference values between items. The authors explain the two types of weighting offered by slope-one recommender, which are based on count and standard deviation. The article also features an overview of storage options and computation distribution method.
Did you like the movie “Carlito's Way”? Most people who liked this movie, it seems, also liked another film starring Al Pacino – like “Scarface”. But people tend to like Scarface a bit more. We'd imagine most people that think of Carlito's Way as a four-star movie would give Scarface five stars. So if you told me you thought Carlito's Way was a three-star movie, I might guess you'd give Scarface four stars – one more than the other film.
If you agree with this sort of reasoning, you will like the slope-one recommender (http://en.wikipedia.org/wiki/Slope_One). It estimates preferences for new items based on average difference in preference value (“diffs”) between a new item and the other items the user prefers.
For example, let's say that we know that, on average, people rate Scarface higher by 1.0 than Carlito's Way. Let's also say we find everyone rates Scarface the same as The Godfather, on average. And now, we are presented with a user who rates Carlito's Way 2.0, and The Godfather 4.0. What do we estimate his preference for Scarface would be?
Based on Carlito's Way, we'd guess 2.0 + 1.0 = 3.0. Based on The Godfather, we'd guess 4.0 + 0.0 = 4.0. Taking a simple average of the two, we'd guess 3.5. This is the essence of the slope-one recommender approach.
Its name comes from the fact that the recommender algorithm starts with the assumption that there is some linear relationship between the preference values for one item and another and that we can in general estimate the preferences for some item y based on the preferences for item x, via some linear function like y = mx + b. Then, the slope-one recommender makes the additional simplifying assumption that m = 1: “slope one”. We're left attempting to find b = y – x, the (average) difference in preference value, for every pair of items.
So, the algorithm consists of a significant preprocessing phase, in which all item-item preference value differences are computed:
for every item i
for every other item j
for every user u expressing preference for both i and j
add the difference in u’s preference for i and j to an average
And then, the recommendation algorithm becomes:
for every item i the user u expresses no preference for
for every item j that user u expresses a preference for
find the average preference difference between j and i
add this diff to u’s preference value for j
add this to a running average
return the top items, ranked by these averages
The average diffs over the small sample recommender input we have been using throughout the book are showing in Table 1.
Table 1 Average difference in preference value between all pairs of items. Cells along the diagonal are 0.0. Cells in the bottom left are simply the negative of their counterparts across the diagonal. Hence these are not represented explicitly. Some diffs don’t exist, such as 102 through 107, since no user expressed a preference for both 102 and 107.
Slope-one is attractive because the on-line portion of the algorithm is fast. Like an item-based recommender, its performance does not depend upon the number of users in the data model. It depends only upon the average preference difference between every pair of items, which can be pre-computed. Further, its underlying data structure can be efficiently updated: when a preference changes, it’s simple to update relevant diff values. In contexts where preferences may change quickly, this is an asset.
Note that the memory requirements necessary to store all of these item-item differences in preference value grow as the square of the number of items. Twice as many items means four times the memory!
Slope-one in practice
We can easily try the slope-one recommender by simply employing the code below. Note that the slope-one recommender takes no similarity metric as a necessary argument: new SlopeOneRecommender(model).
After running a standard evaluation using the GroupLens 10M ratings data set (go to http://grouplens.org and check out the 10 million rating data set), you’ll get a result near 0.65. That’s the best yet. Indeed, the simple slope-one approach works well in many cases. This algorithm does not make use of a similarity metric, unlike the other approaches we have looked at. It has relatively few “knobs” to twiddle.
The simplest form of the slope-one algorithm has a vulnerability: item-item diffs are given equal weighting regardless of how “reliable” they are and how much data they are based upon. Let’s say only one user in the history of movie watching has rated both Carlito’s Way and The Notebook. It’s possible; they’re quite different films. We could compute a diff for these two films. Would it be as useful as the diff we compute between Carlito’s Way and The Godfather, averaged over thousands of users? It sounds unlikely. The latter diff is probably more reliable since it is an average over a higher count of users.
Again, we can employ some form of weighting to improve recommendations by taking some account of this. SlopeOneRecommender offers two types of weighting: weighting based on count and on standard deviation. Slope-one estimates preference values by adding diffs to all of the user’s current preference values and then averaging all of those results together to form an estimate. Count weighting will weight more heavily those elements based on diffs that are based on more data and more users who have expressed a preference for both items in question. In particular, the average becomes a weighted average, where the diff “count” is the weight – the number of users on which the diff is based.
Similarly, standard deviation weighting will weight according to the standard deviation of difference in preference value. Lower standard deviation means higher weighting. If the difference in preference value between two films is very consistent across many users, it seems more reliable and should be given more weight. If it varies considerably from user to user, then it should be deemphasized.
These variants turn out to be enough of a good idea that they are enabled by default. You already We could disable them to see the effect, as seen in Listing 1.
Listing 1 Selecting no weighting with a SlopeOneRecommender
DiffStorage diffStorage = new MemoryDiffStorage(
model, Weighting.UNWEIGHTED, false, Long.MAX_VALUE));
return new SlopeOneRecommender(
The result is 0.67 -- only slightly worse on this data set.
DiffStorage and memory considerations
Slope-one does have its price, as we noted: memory consumption. In fact, if you tweak the evaluation to use even 10% of all data (about 100,000 ratings), even a 1 gigabyte heap won’t be enough. The diffs are used so frequently, and it’s so relatively expensive to compute them, that they do need to be computed and stored ahead of time. But, keeping them all in memory can get expensive.
Storage of diffs is encapsulated separately in implementations of DiffStorage. We’ve been using, by default, MemoryDiffStorage so far. Not surprisingly, this implementation keeps diffs in memory. It offers one constructor parameter that can trade off some accuracy for slightly less memory consumption: compactAverages. This will cause the implementation to use smaller primitive data types to store count, average, and standard deviation. It’s worth a try if pressed for memory, but, by that point you will want to look to storing the diffs externally, such as in a database. Fortunately, implementations like MySQLJDBCDiffStorage exist for this purpose. It must be used in conjunction with a JDBC-backed DataModel implementation like MySQLJDBCDataModel, as seen in Listing 2.
Listing 2 Creating a JDBC-backed DiffStorage
AbstractJDBCDataModel model = new MySQLJDBCDataModel();
DiffStorage diffStorage = new MySQLJDBCDiffStorage(model);
Recommender recommender = new SlopeOneRecommender(
model, Weighting.WEIGHTED, Weighting.WEIGHTED, diffStorage);
As with MySQLJDBCDataModel, the table name and column names used by MySQLJDBCDiffStorage can be customized via constructor parameters.
Distributing the precomputation
Precomputing the item-item diffs is significant work. While it is more likely that the size of your data will cause problems with memory requirements before the time required to compute these diffs becomes problematic, you might be wondering if there are ways to distribute this computation to complete faster. Diffs can be updated easily at runtime in response to new information, so, a relatively infrequent offline precomputation process is feasible in this model. Distributing the diff computation via Hadoop is supported.
In this article, we examined a slope-one recommender, a unique and relatively simple approach to recommendation based on average differences in preference values between items. It requires significant precomputation and storage for these diffs, and so we explored how to store these both in memory and in a database.