Over a million developers have joined DZone.

Facebook Graph Search with Cypher and Neo4j

DZone's Guide to

Facebook Graph Search with Cypher and Neo4j

· Database Zone ·
Free Resource

Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.  


Facebook Graph Search has given the Graph Database community a simpler way to explain what it is we do and why it matters. I wanted to drive the point home by building a proof of concept of how you could do this with Neo4j. However, I don’t have six months or much experience with NLP (natural language processing). What I do have is Cypher. Cypher is Neo4j’s graph language and it makes it easy to express what we are looking for in the graph. I needed a way to take “natural language” and create Cypher from it. This was going to be a problem.

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

It’s an old programmer joke, but that is what came to mind. Some kind of fuzzy regular expressions. In the IPhone world, we usually hear people say “There’s an App for that”. In Ruby world, we go with “there’s a Gem for that”… so I asked google for some help and came upon Semr.

Semr is the gateway drug framework to supporting natural language processing in your application. It’s goal is to follow the 80/20 rule where 80% of what you want to express in a DSL is possible in familiar way to how developers normally solve solutions. (Note: There are other more flexible solutions but also come with a higher learing curve, i.e. like treetop)

Awesome, a ray of light to solve my problem… but the Gem is 4 years old. I could not get it to install. Bummer… Wait what was that about Treetop?

Treetop is a language for describing languages. Combining the elegance of Ruby with cutting-edge parsing expression grammars, it helps you analyze syntax with revolutionary ease.

Score! Now I had no idea how to write a proper language grammar, but that’s never stopped anyone before. Someone who has more than a couple hours of experience with Treetop is going to laugh at this but I’ll show you part of what I did:

rule friends
  "friends" <Friends>

rule likes
  "who like" <Likes>

rule likeand
  likes space thing space "and" space thing <LikeAnd>

rule thing
  [a-zA-Z0-9]+ <Thing>

I am creating some rules for things, and the likes relationship, and also the idea of “likes this and that”.
The “natural language” is run by these rules and a syntax tree is generated with the matching rules. These are then turned into hashes representing pieces of cypher. Looking at the code above and below you can see how “friends who like Neo4j” gets parsed into Friends, Likes, Thing.

class Friends < Treetop::Runtime::SyntaxNode
  def to_cypher
      return {:start  => "me = node({me})", 
              :match  => "me -[:friends]-> people",
              :return => "people",
              :params => {"me" => nil }}

class Likes < Treetop::Runtime::SyntaxNode
  def to_cypher
      return {:match => "people -[:likes]-> thing"}

class Thing < Treetop::Runtime::SyntaxNode
  def to_cypher
      return {:start  => "thing = node:things({thing})",
              :params => {"thing" => "name: " + self.text_value } }

Then these hashes are combined and turned into a proper Cypher string:

class Expression < Treetop::Runtime::SyntaxNode
  def to_cypher
    cypher_hash =  self.elements[0].to_cypher
    cypher_string = ""
    cypher_string << "START "   + cypher_hash[:start].uniq.join(", ")
    cypher_string << " MATCH "  + cypher_hash[:match].uniq.join(", ") unless cypher_hash[:match].empty?
    cypher_string << " RETURN DISTINCT " + cypher_hash[:return].uniq.join(", ")
    params = cypher_hash[:params].empty? ? {} : cypher_hash[:params].uniq.inject {|a,h| a.merge(h)}
    return [cypher_string, params].compact

Finally I built a Sinatra web application that imports your data from Facebook and a search page so you can try this out for yourself. As always, the code is available on Github, and hosted on Heroku.

While reproducing a “kinda” Facebook Graph Search is interesting, what would be more interesting is seeing other people use this idea on their own data. If you would like to know more about this proof of concept, contact me or come to the Neo4j Meetups in Virginia (Feb 26th) or in Boston (Feb 28th) or in Chicago (TBD) and somewhere near you.

Download AnzoGraph now and find out for yourself why it is acknowledged as the most complete all-in-one data warehouse for BI style and graph analytics.  


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}