Over a million developers have joined DZone.

Storing and querying RDF data in Neo4J through Sail

DZone's Guide to

Storing and querying RDF data in Neo4J through Sail

· Java Zone ·
Free Resource

Secure your Java app or API service quickly and easily with Okta's user authentication and authorization libraries. Developer accounts are free forever. Try Okta Instead.

Recently, I got asked to implement a storage and querying platform for biological RDF (Resource Description Framework) data. RDF data is a set of statements about resources in the form of subject-predicate-object expressions (also referred to as triples). Let’s have a look at some simple RDF triples that define ‘me’, Davy Suvee:

<http://example.org/person/Davy_Suvee> <http://example.org/person/first_name> "Davy" .
<http://example.org/person/Davy_Suvee> <http://example.org/person/last_name> "Suvee" .
<http://example.org/person/Davy_Suvee> <http://example.org/person/age> "31" .
<http://example.org/person/Davy_Suvee> <http://example.org/company> <http://example.org/company/DataBlend> .
<http://example.org/company/DataBlend> <http://example.org/company/name> "DataBlend" .
<http://example.org/company/DataBlend> <http://example.org/company/vat> "BE0894.523.805" .

Each subject is identified through an URI (Uniform Resource Identifier). For instance, I identify myself as being http://www.example.org/person/Davy_Suvee. A predicate, also identified through an URI, either points to a literal value or to a concrete object (which is again identified through an URI). In the example above, the first_name, last_name and age predicates all point to a literal value, while the company predicate points to http://www.example.org/company/DataBlend, the company I work for. The DataBlend subject also exhibits a number of properties, including name and VAT-number. Today’s triplestores allow you to save billions of these triples and information is retrieved through so-called SPARQL-queries. For instance, to retrieve my first name and age, I can use the following SPARQL-query:

PREFIX person: <http://example.org/person/>
SELECT ?first_name ?age
  person:Davy_Suvee person:first_name ?first_name .
  person:Davy_Suvee person:age ?age .


2. Neo4J as a RDF data store

Similar to SQL, SPARQL provides a set of powerful querying constructs that allow you to declaratively specify your needs. Calculating shortest paths between random subjects on the contrary, can not easily be accomplished through SPARQL (unless one encodes the specific path structure, which kind of defeats the point). Being able to quickly calculate shortest paths, which is a requirement for the project I’m implementing, is one of the main selling points of Graph Databases. As RDF data can be thought of as a graph, it comes as no surprise that many Graph Databases, including Neo4J, provide native support for storing and querying RDF data. In case of Neo4J, this is achieved through the use of the neo4j-rdf, neo4j-rdf-sparql and neo-rdf-sail components. Unfortunately, I couldn’t find a recent piece of code that details the various steps for automatically importing RDF triple files within Neo4J. Hence, this article. The complete source code can be found on the Datablend public GitHub repository.

Start by setting up the Neo4J database connection:

// Create the sail graph database
graphDb = new EmbeddedGraphDatabase("var/flights");
indexService = new LuceneIndexService(graphDb);
fulltextIndex = new SimpleFulltextIndex(graphDb, new File("var/flights/lucene-fulltext"));
rdfStore = new VerboseQuadStore(graphDb, indexService, null, fulltextIndex);
sail = new GraphDatabaseSail(graphDb, rdfStore);
// Initialize the sail store
// Get the sail repository connection
connection = new SailRepository(sail).getConnection();


An embedded Neo4J graph database (EmbeddedGraphDatabase) is used for importing 5MB of RDF tuples containing airline flight information. (This example data set was found at rdfdata.org, a great resource for some open RDF data sets). In order to easily find back flight information, we fully text-index our RDF triples (through Lucene). Next, we wrap the embedded Neo4J graph database as a VerboseQuadStore (one of internal triples store implementations provided by Neo4J). Finally, we expose our triple store through the Sail interface, which is part of the openrdf.org project. By doing so, we can use an entire range of RDF utilities (parsers and query evaluators) that are part of the openrdf.org project. Once we have a sail connection available, we can import the required RDF triples through the add-method.

connection.add(getResource("sneeair.rdf"), null, RDFFormat.RDFXML, new Resource[]{});

That’s it! Once the import is finished, you can query your RDF triplets by executing a SPARQL-query. The query below for instance, will retrieve the flight number, departure and destination city of all flights that have a duration of 1 hour and 35 minutes.

// Create query
TupleQuery durationquery = connection.prepareTupleQuery(QueryLanguage.SPARQL,
    "PREFIX io: <http://www.daml.org/2001/06/itinerary/itinerary-ont#> " +
    "PREFIX fl: <http://www.snee.com/ns/flights#> " +
    "SELECT ?number ?departure ?destination " +
    "WHERE { " +
        "?flight io:flight ?number . " +
        "?flight fl:flightFromCityName ?departure . " +
        "?flight fl:flightToCityName ?destination . " +
        "?flight io:duration \"1:35\" . " +
// Evaluate and print results
TupleQueryResult result = durationquery.evaluate();
while (result.hasNext()) {
    BindingSet binding = result.next();
    System.out.println(binding.getBinding("number").getValue() + " " +
                       binding.getBinding("departure").getValue() + " " +


3. Shortest path calculation

Through the SimpleFulltextIndex we can easily find back the Neo4J node equivalent of a particular RDF subject. Once we got hold of the required nodes, we can use the graph algorithms provided in the neo4j-graph-algo component to calculate (shortest) paths. Very cool!

Secure your Java app or API service quickly and easily with Okta's user authentication and authorization libraries. Developer accounts are free forever. Try Okta Instead.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}