Platinum Partner
groovy,cloud,nosql,neo4j,graph database,heroku,gremlin

Neo4j on Heroku - Part 2

We are picking up where we left off on Neo4j on Heroku –Part One so make sure you’ve read it or you’ll be a little lost. So far, we have cloned the Neoflix project, set up our Heroku application and added the Neo4j add-on to our application. We are now ready to populate our graph.

Bring up two browser windows. On one you’ll go to your Neo4j instance running on Heroku,

$ heroku config
NEO4J_URL      => http://xxxxxxxx:yyyyyyyy@70825a524.hosted.neo4j.org:7014


and on the other you’ll go to the create_graph route of your app. So if you named your app neoflix, you’d go to neoflix dot herokuapp dot com/create_graph.

This will run the create_graph method and you’ll see nodes and relationships being created on the Neo4j Dashboard. It’s just over a million relationships, so it will take a few minutes. There are faster ways to load data into Neo4j (wait for part three of this series), but this will work in our case.



 

The fine folks at themoviedb.org provide an API for any developers that want to integrate movie and cast data along with posters or movie fan art. You can request an API key and they’ll respond very quickly. So let’s add this to our Heroku configs.

heroku config:add TMDB_KEY=XXXXXXX
Adding config vars and restarting app... done, vXX
  TMDB    => XXXXXXX


If you want to test locally you can do so by:

export TMDB_KEY=XXXXXXX


We can now use this environment variable on our application along with the ruby-tmdb gem by Aaron Gough:

require 'ruby-tmdb'

Tmdb.api_key = ENV['TMDB_KEY']
Tmdb.default_language = "en"

  def get_poster(data)
    movie = TmdbMovie.find(:title => CGI::escape(data["title"] || ""), :limit => 1)
    if movie.empty?
     "No Movie Poster found"
    else
      "<a href="#{movie.url}" target='_blank'>
       <img src="#{movie.posters.first.url}">
       <h3>#{movie.tagline}</h3>
       <p>Rating: #{movie.rating} <br />
          Rated: #{movie.certification}</p><p>#{movie.overview}</p>"
    end
  end


We will visualize the graph like I showed you earlier using Neovigator, but instead of retrieving the properties of our node (since they’re pretty bland), we’ll request a movie poster.


We will not visualize the explicit relationships we created. Instead we will visualize the implicit movie recommendations graph. Let’s take a look at that method now:

def get_recommendations(neo, node_id)
  rec = neo.execute_script("m = [:];
                            x = [] as Set;
                            v = g.v(node_id);

                            v.
                            out('hasGenera').
                            aggregate(x).
                            back(2).
                            inE('rated').
                            filter{it.getProperty('stars') > 3}.
                            outV.
                            outE('rated').
                            filter{it.getProperty('stars') > 3}.
                            inV.
                            filter{it != v}.
                            filter{it.out('hasGenera').toSet().equals(x)}.
                            groupCount(m){\"${it.id}:${it.title.replaceAll(',',' ')}\"}.iterate();

                            m.sort{a,b -> b.value <=> a.value}[0..24];",
                            {:node_id => node_id.to_i})

  return [{"id" => node_id,
           "name" => "No Recommendations",
           "values" => [{"id" => "#{node_id}",
                         "name" => "No Recommendations"}]
          }] if rec == "{}"

  values = rec[1..rec.size-1].split(',').collect{ |v| {:id => v.split(':')[0].strip, 
                                                       :name => v.split(':')[1] } }

  [{"id" => node_id ,"name" => "Recommendations","values" => values }]
end


Let’s go through the code. In Groovy [:] is a map (equivalent to a Ruby Hash) and ultimately what we want to return, so we’ll create an empty one and fill it later. Then we’ll create a Set “x” (which is an unordered collection see Groovy List for ordered collections). We also get our starting vertex and assign it to “v”.

 

m = [:];
x = [] as Set;
v = g.v(node_id);


We will fill the empty Set we created with the generas of our movie and we’ll compare the generas of other movies against it later on.

v.
out('hasGenera').
aggregate(x).


We then go back 2 steps, which puts us at our starting movie and go to the users that have rated the movie with more than 3 stars.

back(2).
inE('rated').
filter{it.getProperty('stars') > 3}.


From these users, we step out to find all the movies they have also rated with more than 3 stars.

outV.
outE('rated').
filter{it.getProperty('stars') > 3}.


Which are not our starting movie (remember we set it to the variable “v”).

inV.
filter{it != v}.


…and we check that these movies have the same generas as our starting movie (remember we filled the Set “x”).

filter{it.out('hasGenera').toSet().equals(x)}.


groupCount does what it sounds like and stores the value in the map “m” we created earlier. However, we want to get the id, title and count, so we do a little string wrangling to get both id and title (minus commas… I’ll tell you why in a minute) and iterate(). The Gremlin shell iterates automatically for you, but since we’re sending this Gremlin script over the REST API, it doesn’t. One day you’ll be pulling out your hair trying to figure out what’s wrong and you’ll curse “iterate” once you figure it out…

groupCount(m){\"${it.id}:${it.title.replaceAll(',',' ')}\"}.iterate();


Here we sort our Map (b has the count) and get the top 25 entries.

m.sort{a,b -> b.value <=> a.value}[0..24];",


Since Neo4j will be executing this code many times over, you want to parametize it, so it parses it only once.

{:node_id => node_id.to_i})


If we get an empty hash back, we’ll return an unfortunate “No Recommendations” message,

return [{"id" => node_id,
         "name" => "No Recommendations",
         "values" => [{"id" => "#{node_id}",
                       "name" => "No Recommendations"}]
        }] if rec == "{}"


Finally we structure our Groovy Map into an array of hashes which we use in our visualization like I showed you with Neovigator. Notice I’m splitting the record by commas (hence why we substituted them earlier). This piece won’t be necessary very soon as the final version of Neo4j 1.6 will have JSON support for Groovy Maps.

values = rec[1..rec.size-1].split(',').collect{ |v| {:id => v.split(':')[0].strip, 
                                                     :name => v.split(':')[1] } }
[{"id" => node_id ,"name" => "Recommendations","values" => values }]

 


We save the results of getting a movie poster and its recommendations for 30 days by taking advantage of the Varnish Cache provided to us by Heroku. We then get our starting node either by id or by title.

get '/resources/show' do
  response.headers['Cache-Control'] = 'public, max-age=2592000'
  content_type :json

  if params[:id].is_numeric?
    node = neo.get_node(params[:id])
  else
    node = neo.execute_script("g.idx(Tokens.T.v)[[title:'#{CGI::unescape(params[:id])}']].next();")
  end

  id = node_id(node)

  {:details_html => "<h2>#{get_name(node["data"])}</h2>" + get_poster(node["data"]),
   :data => {:attributes => get_recommendations(neo, id),
             :name => get_name(node["data"]),
             :id => id}
   }.to_json
end


By title? Yes, we are adding JQuery UI autocomplete to our application. Which will pass the name of the movie and look it up in the automatic index we created.

node = neo.execute_script("g.idx(Tokens.T.v)[[title:'#{CGI::unescape(params[:id])}']].next();")


… and there you have it. Your very own Movie Recommendation website on Heroku. See the complete code at github.com/maxdemarzi/neoflix.

UPDATE: Looks like a few of you dear readers tried to run create_graph multiple times and it made a mess. I will try to fix it and get it back up soon. Note to future self: remove create_graph route on heroku before publishing post.


Source: http://maxdemarzi.com/2012/01/16/neo4j-on-heroku-part-two/

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks
Tweet

{{parent.nComments}}