Neo4j on Heroku - Part 1
Join the DZone community and get the full member experience.
Join For Freein this two part series, we are going to take his work from the gremlin shell and put it on the web using the heroku neo4j add-on and altering the neovigator project for our use case. heroku has a great article on how to get an example neo4j application up and running on their dev center and michael hunger shows you how to add jruby extensions and provides sample code using the neo4j.rb gem by andreas ronge .
we are going to follow their recipe, but we are going to add a little spice. instead of creating a small 2 node, 1 relationship graph, i am going to show you how to leverage the power of gremlin and groovy to build a much larger graph from a set of files.
let’s start by cloning the neoflix sinatra application, and instead of installing and starting neo4j locally, we are going to create a heroku application, and add neo4j.
git clone git@github.com:maxdemarzi/neoflix.git cd neoflix bundle install heroku apps:create neoflix --stack cedar heroku addons:add neo4j git push heroku master
let’s make sure that neo4j was successfully added to our project:
$ heroku addons logging:basic neo4j:test releases:basic
great, there it is (if you are reading this in the future it might say
neo4j:basic or neo4j:silver or something like that). so where is our
neo4j database exactly?
$ heroku config gem_path => vendor/bundle/ruby/1.9.1 lang => en_us.utf-8 neo4j_host => 70825a524.hosted.neo4j.org neo4j_instance => 70825a524 neo4j_login => xxxxxxxx neo4j_password => yyyyyyyy neo4j_port => 7014 neo4j_rest_url => http://xxxxxxxx:yyyyyyyy@70825a524.hosted.neo4j.org:7014/db/data neo4j_url => http://xxxxxxxx:yyyyyyyy@70825a524.hosted.neo4j.org:7014 path => bin:vendor/bundle/ruby/1.9.1/bin:/usr/local/bin:/usr/bin:/bin rack_env => production
the xs and ys are our username and password. we can use the address
given in neo4j_url to take a look at the server. for part two, it would
be wise to keep an eye on the “dashboard” as we create new nodes and
relationships. the neoflix project layout:
neoflix.rb public/movies.dat public/users.dat public/ratings.dat
let’s take a look at the source code in neoflix.rb: we require our gems
and use the neo4j_url variable to tell neography how to reach the neo4j
server.
require 'rubygems' require 'neography' require 'sinatra' neo = neography::rest.new(env['neo4j_url'] || "http://localhost:7474")
then we create a route in sinatra that will clear and populate the graph when we visit it.
get '/create_graph' do neo.execute_script("g.clear();") create_graph(neo) end
we use a gremlin shortcut to delete the graph before creating it.
g.clear();
the backup and restore feature of the heroku add-on lets you reload
your graph as well, but the neo4j instance will be down temporarily
during the exchange.
if you want to permanently delete the neo4j instance (once you are done with this example application) you can simply remove the heroku addon.
heroku addons:remove neo4j:test removing neo4j:test from neoflix...done.
let’s see part of the create_graph method.
we do not want to create the graph if it already exists. so we check to see if there are any movie nodes before starting.
def create_graph(neo) return if neo.execute_script("g.idx('vertices')[[type:'movie']].count();").to_i > 0
since we wiped everything clean, we setup
automatic indexing
on all vertices and all properties.
if neo.execute_script("g.indices;").empty? neo.execute_script("g.createautomaticindex('vertices', vertex.class, null);") end
we are going to create a lot of data, so we set our graph to commit every 1000 changes in an
automatic transaction
.
g.setmaxbuffersize(1000);
here comes some magic. we do not have access to the file system of the
server running our neo4j instance but since we have the full power of
groovy at our disposal, we simply grab the file from sinatra instead.
anything you put in the public directory will be automatically served
for you. the fields of movies.dat are delimited by “::” and the generas
are delimited by “|”.
1::toy story (1995)::animation|children's|comedy 2::jumanji (1995)::adventure|children's|fantasy 3::grumpier old men (1995)::comedy|romance
so for each line in our file, we are going to create a movie vertex, and
link it to one or more generas. we are sending this gremlin script
inside a ruby string, so we must escape the escape slashes which escape
the | in the final script. as we go along, we are also creating
vertices for the generas if they don’t already exist.
'http://neoflix.heroku.com/movies.dat'.tourl().eachline { def line -> def components = line.split('::'); def movievertex = g.addvertex(['type':'movie', 'movieid':components[0].tointeger(), 'title':components[1]]); components[2].split('\\\\|').each { def genera -> def hits = g.idx(tokens.t.v)[[genera:genera]].iterator(); def generavertex = hits.hasnext() ? hits.next() : g.addvertex(['type':'genera', 'genera':genera]); g.addedge(movievertex, generavertex, 'hasgenera'); } };
if you are a rubyist, you should be able to read that groovy code, but let me point out a few things. in groovy
variable definitions
it is mandatory to either provide a type name explicitly or to use “def” in replacement.
and this funky piece of code is an unfortunate escape of the pipe character by a backslash which also needs to be escaped, which are both in our ruby string and must also be escaped.
components[2].split('\\\\|').each { def genera ->
this next bit of code looks up the genera in our index, and if it doesn’t exist, it creates it.
def hits = g.idx(tokens.t.v)[[genera:genera]].iterator(); def generavertex = hits.hasnext() ? hits.next() : g.addvertex(['type':'genera', 'genera':genera]);
this hash inside an array inside an array looking construct is gremlins
way of querying the index. we are telling it to return a node if it has
a property genera that matches the genera variable we parsed after
splitting the components[2] field.
g.idx(tokens.t.v)[[genera:genera]].iterator();
we do this a few more times to load the users and ratings into our graph and end with this:
g.stoptransaction(transactionalgraph.conclusion.success);")
which commits any left over items in our transaction buffer.
in part two, we’ll bring up our heroku app, load the data, possibly add movie posters from a third party api, and visualize some of the implicit relationships in the graph as outlined in the original blog post… and i’ll probably do a part three which will use the fresh off the presses csv file importer and reload the graph with a bigger set of movie data using heroku. in between however i think it’s time we looked at neo4j spatial . you’ll know when new posts are published by following me on twitter.
source:
http://maxdemarzi.com/2012/01/13/neo4j-on-heroku-part-one/
Opinions expressed by DZone contributors are their own.
Comments