Using Mahout Recommenders in Grails
Join the DZone community and get the full member experience.
Join For Freeapache mahout is a scalable machine learning framework that can be used to create intelligent applications. in this article we’ll see how mahout can be used to create personalised recommendations within a grails application.
this article originally appeared in the february 2012 edition of groovymag .
about mahout
mahout started off as a sub-project of apache lucene and the name is a hindi word referring to an elephant driver; portions of mahout are built on top of hadoop which was named after a stuffed toy elephant owned by the son of doug cutting who started that project.
hadoop was extracted from the nutch crawler lucene sub-project and provides a scalable data processing framework using map-reduce on top of a distributed file system (hdfs). the use of hadoop is beyond the scope of this introductory article.
mahout has three primary areas of machine learning functionality: classification, clustering and recommendations.
classification can be used to determine how similar an item is to previously encountered items or patterns and whether it belongs to the same category.
for instance, mahout supports naive bayes classification with the ‘hello world’ sample training a classifier and then classifying posts from 20 newsgroups (a common reference data set for machine learning research).
clustering groups items together based on their similarity – e.g. this can be observed in google search results.
we’re going to focus on recommendations , sometimes referred to as collaborative filtering. amazon is a well-known example of providing suggested "other people also bought" products.
for further information on mahout and clustering or classification i suggest you read mahout in action (see what i did there?).
recommendations
recommendations provide a discovery mechanism to introduce users to new items that may be of interest to them; it is usually associated with cross-selling tactics (e.g. ‘also bought’) on retail e-commerce sites.
mahout provides a set of components to enable the construction of customised recommendation engines.
firstly there is the recommender that will produce recommendations based on a datamodel.
a user-based recommender will look for similar preferences or behavior patterns between users (usersimilarity), and then group these into neighbourhoods (userneighborhood) e.g. the nearest 10 users to the specified user. the chosen algorithm will then select the recommendations of new items from within the neighbourhood.
an item-based recommender works on similarity between items (itemsimilarity).
the datamodel has a very simple representation of the preference data to work with: a long user-id, a long item-id and a float preference value. this is limited to a tuple to reduce memory consumption and therefore increase scalability. datamodel implementations are available for files, mysql, generic jdbc and postgres.
https://cwiki.apache.org/mahout/recommender-documentation.html has lots more useful information and further links.
grails recommendations
lim chee kin has been hard at work and recently released a grails mahout-recommender plugin in december 2011. the plugin is intended to let you evaluate different recommenders without needing to write any code; once you have selected a recommender then it can be enabled using configuration.
we’ll start by looking at the sample packaged libimseti application based on code from chapter 5 of mahout in action which uses 17+ million profile ratings from a czech dating site to recommend compatible user profiles.
libimseti
getting set up
i created a clean grails 2.0.0 application and installed the plugin (version 0.5.1 at time of writing) using
the commands in listing 1. the plugin adds some new scripts, including
the ability to install a sample application that we can run using the grails install-libimseti-sample-app command.
grails create-app groovymagmahout cd groovymagmahout grails install-plugin mahout-recommender
listing 1: application creation and plugin installation
sample data
the libimseti sample data is not redistributable, so you’ll need to download the zip from http://www.occamslab.com/petricek/data/libimseti-complete.zip and then extract the .dat files to grails-app/conf
configuration
there are two more things we need to do before we can run the grails application, firstly we need to adjust the plugin configuration in config.groovy to add the settings from listing 2 and we also need to give grails some more memory for the hungry algorithm to prevent an "outofmemoryerror: gc overhead limit exceeded" which is achieved by listing 3.
mahout.recommender.haspreference = false mahout.recommender.data.file = 'ratings.dat'
listing 2: additional plugin configuration
export grails_opts="-xx:maxpermsize=256m -xmx1024m -server"
listing 3:
increasing the memory allocated to grails
libimseti sample usage
when we execute grails run-app and browse to http://localhost:8080/groovymagmahout we will see that as per figure 1 we have a single recommender controller listed.
selecting the controller will bring us to the settings form shown in figure 2.
enter user id ’133′, submit the form and after some time (the algorithm is tuned for better matching rather than performance) you’ll see the recommendations shown in figure 3 where a higher score means a better match.
what next?
the libimseti sample application is good to prove that the theory works on a reasonably sized dataset, but not many of us are going to want to curate our data to match an input file. more realistically we’ll have an application that has associations between users and items or allows users to rate items.
we’ll build a simple grails implementation – the source code is available on github from https://github.com/rbramley/groovymagmahout
the data model
as a simplification for this exercise we will use a single preference table, this will represent the link (many-to-many join) table between a user and an item as illustrated in erd notation in figure 4.
this table will have a composite primary key comprising of the user and items ids and then a value rating the strength of the preference. for the preference we’ll use a 1 to 5 range as this may be represented by a rating widget (such as that provided by the grails rich ui plugin).
we’ll create a domain class (using grails create-domain-class com.rbramley.mahout.preference) and specify it as per listing 4. note that we’ve used a composite key to satisfy mahout’s needs, but this exposes a view minor grails issues with the generated default controller and views (grails generate-all com.rbramley.mahout.preference) not being composite-key aware (this shouldn’t affect you unless you want to edit/delete preference records).
package com.rbramley.mahout import org.apache.commons.lang.builder.hashcodebuilder class preference implements serializable { long userid long itemid float prefvalue static constraints = { userid() itemid() prefvalue range: 0.0f..5.0f } boolean equals(other) { if(!(other instanceof preference)) { return false } other.userid == userid && other.itemid == itemid } int hashcode() { def builder = new hashcodebuilder() builder.append userid builder.append itemid builder.tohashcode() } static mapping = { id composite: ['userid', 'itemid'] version false } }
listing 4: domain class
we’ll use mysql for the database, as that mahout datamodel provider implementation is supported by the grails plugin (mahout also has postgres and generic jdbc implementations).
firstly we need to create the target schema in mysql (listing 5).
mysql -u root -p mysql> create database recommender; mysql> grant all on recommender.* to recommender@localhost identified by 'mahoutdemo';
listing 5: mysql commands
with that done we can uncomment the runtime mysql-connector-java dependency in buildconfig.groovy and then configure datasource.groovy accordingly (listing 6).
development { datasource { driverclassname = "com.mysql.jdbc.driver" dbcreate = "create-drop" // one of 'create', 'create-drop', 'update', 'validate' url = "jdbc:mysql://localhost:3306/recommender" username = "recommender" password = "mahoutdemo" } }
listing 6: development data source configuration
reconfiguring the plugin
we now need to reconfigure config.groovy to instruct the plugin which recommender and similarity algorithms to use and where to obtain the data from, this is achieved using the settings in listing 7.
mahout.recommender.mode = 'config' // 'input', 'config' or 'class' mahout.recommender.haspreference = true mahout.recommender.selected = 1 // user-based mahout.recommender.similarity = 'pearsoncorrelation' mahout.recommender.withweighting = false mahout.recommender.neighborhood = 2 mahout.recommender.data.model = 'mysql' mahout.recommender.preference.table = 'preference' mahout.recommender.preference.valuecolumn = 'pref_value'
listing 7: plugin configuration
providing data
we can now run the application, select the new com.rbramley.mahout.preferencecontroller and enter some values for our data.
if you enter the data set shown in figure 5, then when you use the recommendations controller to obtain recommendations for user id 1, you should get the recommendations of 104 and 106 as shown in figure 6.
alternatively there is a sql script within the project on github that can be run to seed the preferences table with similar data (based on listing 2.1 from mahout in action).
evaluating recommenders
the grails plugin features a built in recommender evaluator based on average difference, in our case we can access it at http://localhost:8080/groovymagmahout/recommender/evaluator and click on the ‘run evaluator’ link, the sample output is shown in figure 7.
a lower difference is better – so you may want to experiment with changing the mahout.recommender.similarity property that we set in listing 7, valid values are ‘pearsoncorrelation’, ‘euclideandistance’, ‘loglikelihood’ or ‘tanimotocoefficient’.
likewise you may want to modify other properties such as applying weighting or adjusting the size of the neighbourhood – in any case please refer to the configuration section of the plugin manual at http://limcheekin.github.com/mahout-recommender/docs/manual/guide/configuration.html
summary
this article has introduced apache mahout, an open source scalable machine learning framework, and shown how you can utilise it to provide personal recommendations within a grails application. we’ve seen custom recommendations for the libimseti sample data files and recommendations based on user similarity on top of a grails domain class. in practice these recommenders would ideally be invoked asynchronously particularly for large data sets, this could be achieved using ajax techniques.
have fun integrating stylised recommendations into your application, but remember it’s good to allow the users to give feedback on the relevancy of the recommendations!
references / further reading
the following provide valuable sources of information:
Published at DZone with permission of Robin Bramley, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments