Neo4j for PHP
Join the DZone community and get the full member experience.
Join For Free
Note: I've put a lot of effort into Neo4jPHP
since this post was written. It's pretty full-featured and covers
almost 100% of the Neo4j REST interface. Anyone interested in playing
with Neo4j from PHP should definitely check it out. I would love some
feedback!
Lately, I've been playing around with the graph database Neo4j
and its application to certain classes of problems. Graph databases
are meant to solve problems in domains where data relationships can be
multiple levels deep. For example, in a relational database, it's very
easy to answer the question "Give me a list of all actors who have been
in a movie with Kevin Bacon":
> desc roles; +-------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------------+--------------+------+-----+---------+----------------+ | id | int(11) | NO | PRI | NULL | auto_increment | | actor_name | varchar(100) | YES | | NULL | | | role_name | varchar(100) | YES | | NULL | | | movie_title | varchar(100) | YES | | NULL | | +-------------+--------------+------+-----+---------+----------------+ > SELECT actor_name, role_name FROM roles WHERE movie_title IN (SELECT DISTINCT movie_title FROM roles WHERE actor_name='Kevin Bacon')
Excuse my use of sub-queries (re-write it to a JOIN in your head if you wish.)
But suppose you want to get the names of all the actors who have been in
a movie with someone who has been in a movie with Kevin Bacon.
Suddenly, you have yet another JOIN against the same table. Now add a
third degree: someone who has been in a movie with someone who has been
in a movie with someone who has been in a movie with Kevin Bacon. As
you continue to add degrees, the query becomes increasingly unwieldy,
harder to maintain, and less performant.
This is precisely the type of problem graph databases are meant to
solve: finding paths between pieces of data that may be one or more
relationships removed from each other. They solve it very elegantly by
modeling domain objects as graph nodes and edges (relationships in graph
db parlance) and then traversing the graph using well-known and
efficient algorithms.
The above example can be very easily modeled this way: every actor is a
node, every movie is a node, and every role is a relationship going from
the actor to the movie they were in:
Now it becomes very easy to find a path from a given actor to Kevin Bacon.
Neo4j is an open source graph database, with both community and
enterprise licensing structures. It supports transactions and can
handle billions of nodes and relationships in a single instance. It was
originally built to be embedded in Java applications, and most of the
documentation and examples are evidence of that. Unfortunately, there
is no native PHP wrapper for talking to Neo4j.
Luckily, Neo4j also has a built-in REST server and PHP is very good at consuming REST services. There's already a good Neo4j REST PHP library out there, but I decided to write my own to get a better understanding of how the REST interface actually works. You can grab it here and all the code examples below are written using it. The concepts can easily be ported to any Neo4j REST client.
First, we need to initialize a connection to the database. Since this
is a REST interface, there is no persistent connection, and in fact, no
data communication happens until the first time we need to read or write
data:
use Everyman\Neo4j\Client, Everyman\Neo4j\Transport, Everyman\Neo4j\Node, Everyman\Neo4j\Relationship; $client = new Client(new Transport('localhost', 7474));
Now we need to create a node for each of our actors and movies. This is analogous to the INSERT statements used to enter data into a traditional RDBMS:
$keanu = new Node($client); $keanu->setProperty('name', 'Keanu Reeves')->save(); $laurence = new Node($client); $laurence->setProperty('name', 'Laurence Fishburne')->save(); $jennifer = new Node($client); $jennifer->setProperty('name', 'Jennifer Connelly')->save(); $kevin = new Node($client); $kevin->setProperty('name', 'Kevin Bacon')->save(); $matrix = new Node($client); $matrix->setProperty('title', 'The Matrix')->save(); $higherLearning = new Node($client); $higherLearning->setProperty('title', 'Higher Learning')->save(); $mysticRiver = new Node($client); $mysticRiver->setProperty('title', 'Mystic River')->save();
Each node has `setProperty` and `getProperty` methods that allow storing
arbitrary data on the node. No server communication happens until the
`save()` call, which must be called for each node.
Linking an actor to a movie means setting up a relationship between
them. In RDBMS terms, the relationship takes the place of a join table
or a foreign key column. In the example, the relationship always starts
with the actor pointing to the movie, and is tagged as an "acted in"
type relationship:
$keanu->relateTo($matrix, 'IN')->save(); $laurence->relateTo($matrix, 'IN')->save(); $laurence->relateTo($higherLearning, 'IN')->save(); $jennifer->relateTo($higherLearning, 'IN')->save(); $laurence->relateTo($mysticRiver, 'IN')->save(); $kevin->relateTo($mysticRiver, 'IN')->save();
The `relateTo` call returns a Relationship object, which is like a node
in that it can have arbitrary properties stored on it. Each
relationship is also saved to the database.
The direction of the relationship is totally arbitrary; paths can be
found regardless of which direction a relationship points. You can use
whichever semantics make sense for your problem domain. In the example
above, it makes sense that an actor is "in" a movie, but it could just
as easily be phrased that a movie "has" an actor. The same two nodes
can have multiple relationships to each other, with different
directions, types and properties.
The relationships are all set up, and now we are ready to find links
between any actor in our system and Kevin Bacon. Note that the maximum
length of the path is going to be 12 (6 degrees of separation multiplied
by 2 nodes for each degree; an actor node and a movie node.)
$path = $keanu->findPathsTo($kevin) ->setMaxDepth(12) ->getSinglePath(); foreach ($path as $i => $node) { if ($i % 2 == 0) { echo $node->getProperty('name'); if ($i+1 != count($path)) { echo " was in\n"; } } else { echo "\t" . $node->getProperty('title') . " with\n"; } }
A path is an ordered array of nodes. The nodes alternate between being actor and movie nodes.
You can also do the typical "find all the related movies" type queries:
echo $laurence->getProperty('name') . " was in:\n"; $relationships = $laurence->getRelationships('IN'); foreach ($relationships as $relationship) { $movie = $relationship->getEndNode(); echo "\t" . $movie->getProperty('title') . "\n"; }
`getRelationships` returns an array of all relationships that a node
has, optionally limiting to only relationships of a given type. It is
also possible to specify only incoming or outgoing relationships. We've
set up our data so that all 'IN' relationships are from an actor to a
movie, so we know that the end node of any 'IN' relationship is a movie.
There is more available in the REST interface, including node and
relationship indexing, querying, and traversal (which allows more
complicated path finding behaviors.) Transaction/batch operation
support over REST is marked "experimental" for now. I'm hoping to add
wrappers for more of this functionality soon. I'll also be posting more
on the different types of problems that can be solved very elegantly
with graphing databases.
The next post dives into a bit more detail about path-finding and some of the different algorithms available.
Published at DZone with permission of Josh Adell, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments