DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Spring Data Neo4j: How to Update an Entity
  • Leveraging Neo4j for Effective Identity Access Management
  • Graph Database Pruning for Knowledge Representation in LLMs
  • The Beginner's Guide To Understanding Graph Databases

Trending

  • AI-Driven Test Automation Techniques for Multimodal Systems
  • Monolith: The Good, The Bad and The Ugly
  • The Role of AI in Identity and Access Management for Organizations
  • Agentic AI for Automated Application Security and Vulnerability Management
  1. DZone
  2. Data Engineering
  3. Databases
  4. RDF data in Neo4J - the Tinkerpop story

RDF data in Neo4J - the Tinkerpop story

By 
Davy Suvee user avatar
Davy Suvee
·
Oct. 24, 11 · News
Likes (0)
Comment
Save
Tweet
Share
24.9K Views

Join the DZone community and get the full member experience.

Join For Free

My previous blog post discussed the use of Neo4J as a RDF triple store. Michael Hunger however informed me that the neo-rdf-sail component is no longer under active development and advised me to have a look at Tinkerpop’s Sail implementation.

As mentioned in my previous blog post, I recently got asked to implement a storage and querying platform for biological RDF (Resource Description Framework) data. Traditional RDF stores are not really an option as my solution should also provide the ability to calculate shortest paths between random subjects. Calculating shortest path is however one of the strong selling points of Graph Databases and more specifically Neo4J. Unfortunately, the neo-rdf-sail component, which suits my requirements perfectly, is no longer under active development. Tinkerpop’s Sail implementation however, fills the void with an even better alternative!

 

1. What is Tinkerpop?

Tinkerpop is an open source project that provides an entire stack of technologies within the Graph Database space. At the core of this stack is the Blueprints framework. Blueprints can be considered as the JDBC of Graph Databases. By providing a collection of generic interfaces, it allows to develop graph-based applications, without introducing explicit dependencies on concrete Graph Database implementations. Additionally, Blueprints provides concrete bindings for the Neo4J, OrientDB and Dex Graph Databases. On top of Blueprints, the Tinkerpop team developed an entire range of graph technologies, including Gremlin, a powerful, domain-specific language designed for traversing graphs. Hence, once a Blueprints binding is available for a particular Graph Database, an entire range of technologies can be leveraged.

 

2. Tinkerpop and Sail

Last time, I talked about exposing a Neo4J Graph Database (containing RDF triples) through the Sail interface, which is part of the openrdf.org project. By doing so, we can reuse an entire range of RDF utilities (parsers and query evaluators) that are part of the openrdf.org project. The Blueprints framework provides us with a similar ability: each Graph Database binding that implements the Tinkerpop TransactionalGraph and IndexableGraph interfaces can be exposed as a GraphSail, which is Tinkerpop’s implementation of the Sail interface. Once you have your Sail available, storing and querying RDF is analogous to the piece of code shown in my previous blog article.

// Create the sail graph database
graph = new MyNeo4jGraph("var/flights", 100000);
graph.setTransactionMode(TransactionalGraph.Mode.MANUAL);
sail = new GraphSail(graph);

// Initialize the sail store
sail.initialize();

// Get the sail repository connection
connection = new SailRepository(sail).getConnection();

// Import the data
connection.add(getResource("sneeair.rdf"), null, RDFFormat.RDFXML);

// Execute SPARQL query
TupleQuery durationquery = connection.prepareTupleQuery(QueryLanguage.SPARQL,
    "PREFIX io: <http://www.daml.org/2001/06/itinerary/itinerary-ont#> " +
    "PREFIX fl: <http://www.snee.com/ns/flights#> " +
    "SELECT ?number ?departure ?destination " +
    "WHERE { " +
        "?flight io:flight ?number . " +
        "?flight fl:flightFromCityName ?departure . " +
        "?flight fl:flightToCityName ?destination . " +
        "?flight io:duration \"1:35\" . " +
    "}");
TupleQueryResult result = durationquery.evaluate();

The two first lines of code require some more clarification. A TransactionalGraph can be run in MANUAL or AUTOMATIC transaction mode. In AUTOMATIC mode, transactions are basically ignored, in the sense that each item that gets created is immediately persisted in the underlying Graph Database. Although this fits my needs, AUTOMATIC mode is extremely slow in case of Neo4J because of the continuous IO access. MANUAL mode on the other hand is very fast; a new transaction is created at the moment the import of the RDF data file starts and is only committed to the Neo4J data store once all RDF triples are parsed and created. Unfortunately, MANUAL mode does not scale either in my specific situation; as some of my RDF data files contain over 50 million RDF triples, they can not fit into memory (i.e. Java heap space error). Requiring fast imports, I extended the default Neo4J Blueprints binding to support intermediate commits. I based my implementation on Neo4J’s best practices for big transactions. The idea is rather simple: you specify the maximum number of items that can be kept in memory, before they should be committed to the Neo4J data store. Once this number is reached, the current transaction is committed and a new one is automatically started. Simple, but very effective!

public class MyNeo4jGraph extends Neo4jGraph {

    private long numberOfItems = 0;
    private long maxNumberOfItems = 1;

    public MyNeo4jGraph(final String directory, long maxNumberOfItems) {
        super(directory, null);
        this.maxNumberOfItems = maxNumberOfItems;
    }

    public MyNeo4jGraph(final String directory, final Map<String, String> configuration, long maxNumberOfItems) {
        super(directory, configuration);
        this.maxNumberOfItems = maxNumberOfItems;
    }

    public Vertex addVertex(final Object id) {
        Vertex vertex = super.addVertex(id);
        commitIfRequired();
        return vertex;
    }

    public Edge addEdge(final Object id, final Vertex outVertex, final Vertex inVertex, final String label) {
        Edge edge = super.addEdge(id, outVertex, inVertex, label);
        commitIfRequired();
        return edge;
    }

    private void commitIfRequired() {
        // Check whether commit should be executed
        if (++numberOfItems % maxNumberOfItems == 0) {
            // Stop the transaction
            stopTransaction(Conclusion.SUCCESS);
            // Immediately start a new one
            startTransaction();
        }
    }

}

3. Shortest path calculation

Although Blueprints allows you to abstract away the Neo4J implementation details, it still provides you with access to the raw Neo4J data store if needed. Hence, one can still use the graph algorithms provided in the neo4j-graph-algo component to calculate shortest paths between random subjects. The complete source code can be found on the Datablend public GitHub repository.

Resource Description Framework Neo4j Data file Database Graph (Unix)

Published at DZone with permission of Davy Suvee, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Spring Data Neo4j: How to Update an Entity
  • Leveraging Neo4j for Effective Identity Access Management
  • Graph Database Pruning for Knowledge Representation in LLMs
  • The Beginner's Guide To Understanding Graph Databases

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!