Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Visualizing RDF Schema inferencing through Neo4J, Tinkerpop, Sail and Gephi

DZone's Guide to

Visualizing RDF Schema inferencing through Neo4J, Tinkerpop, Sail and Gephi

· Database Zone
Free Resource

Whether you work in SQL Server Management Studio or Visual Studio, Redgate tools integrate with your existing infrastructure, enabling you to align DevOps for your applications with DevOps for your SQL Server databases. Discover true Database DevOps, brought to you in partnership with Redgate.

Last week, the Neo4J plugin for Gephi was released. Gephi is an open-source visualization and manipulation tool that allows users to interactively browse and explore graphs. The graphs themselves can be loaded through a variety of file formats. Thanks to Martin Škurla, it is now possible to load and lazily explore graphs that are stored in a Neo4J data store.

In one of my previous articles, I explained how Neo4J and the Tinkerpop framework can be used to load and query RDF triples. The newly released Neo4J plugin now allows to visually browse these RDF triples and perform some more fancy operations such as finding patterns and executing social network analysis algorithms from within Gephi itself. Tinkerpop’s Sail Ouplementation also supports the notion of RDF Schema inferencing. Inferencing is the process where new (RDF) data is automatically deducted from existing (RDF) data through reasoning. Unfortunately, the Sail reasoner cannot easily be integrated within Gephi, as the Gephi plugin grabs a lock on the Neo4J store and no RDF data can be added, except through the plugin itself.

Being able to visualize the RDF Schema reasoning process and graphically indicate which RDF triples were added manually and which RDF data was automatically inferred would be a nice to have. To implement this feature, we should be able to push graph changes from Tinkerpop and Neo4J to Gephi. Luckily, the Gephi graph streaming plugin allows us to do just that. In the rest of this article, I will detail how to setup the required Gephi environment and how we can stream (inferred) RDF data from Neo4J to Gephi.

 

1. Adding the (inferred) RDF data

Let’s start by setting up the required Neo4J/Tinkerpop/Sail environment that we will use to store and infer RDF triples. The setup is similar to the one explained in my previous Tinkerpop article. However, instead of wrapping our GraphSail as a SailRepository, we will wrap it as a ForwardChainingRDFSInferencer. This inferencer will listen for RDF triples that are added and/or removed and will automatically execute RDF Schema inferencing, applying the rules as defined by the RDF Semantics Recommendation.

neograph = new Neo4jGraph("var/rdf");
// Let's use manual transaction mode
neograph.setMaxBufferSize(0);
sail = new ForwardChainingRDFSInferencer(new GraphSail(neograph));
sail.initialize();
connection = sail.getConnection();

 

We are now ready to add RDF triples. Let’s create a simple loop that allows us to read-in RDF triples and add them to the Sail store.

InferenceLoop loop = new InferenceLoop();
Scanner in = new Scanner(System.in);
while (true) {
   System.out.println("Provide RDF statement:");
   System.out.print("=> ");
   String input = in.nextLine();
   System.out.println("The following edges were created:");
   loop.inference(input);
}

 

The inference method itself is rather simple. We first start by parsing the RDF subject, predicate and object. Next, we start a new transaction, add the statement and commit the transaction. This will not only add the RDF triple to our Neo4J store but will additionally run the RDF Schema inferencing process and automatically add the inferred RDF triples. Pretty easy!

// Parses and add the RDF statement accordingly
public void inference(String statement) throws SailException, InterruptedException {
   String[] triple = statement.split(" ");
   inference(new URIImpl(triple[0]), new URIImpl(triple[1]), new URIImpl(triple[2]));
}

// Add the inference
public void inference(URI subject, URI predicate, URI object) throws SailException, InterruptedException {
   neograph.startTransaction();
   connection.addStatement(subject, predicate, object);
   connection.commit();
   neograph.stopTransaction(TransactionalGraph.Conclusion.SUCCESS);
}

 

But how do we retrieve the inferred RDF triples that were added through the inference process? Although the ForwardChainingRDFSInferencer allows us to register a listener that is able to detect changes to the graph, it does not provide the required API to distinct between the manually added or inferred RDF triples. Luckily, we can still access the underlying Neo4J store and capture these graph changes by implementing the Neo4J TransactionEventHandler interface. After a transaction is committed, we can fetch the newly created relationships (i.e. RDF triples). For each of these relationships, the start node (i.e. RDF subject), end node (i.e. RFD object) and relationship type (i.e. RDF predicate) can be retrieved. In case a RDF triple was added through inference, the value of the boolean property “inferred” is “true”. We filter the relationships to the ones that are defined within our domain (as otherwise the full RDFS meta model will be visualized as well). Finally we push the relevant nodes and edges.

public class PushTransactionEventHandler implements TransactionEventHandler {

   private int id = 1;

   public void afterCommit(TransactionData transactionData, Object o) {
      // Retrieve the created relationships. (The relevant nodes will be retrieved through these relationships)
      Iterable relationships = transactionData.createdRelationships();

      // Iterate and add
      for (Relationship relationship : relationships) {
         // Retrieve the labels
         String start = (String)relationship.getStartNode().getProperty("value");
         String end = (String)relationship.getEndNode().getProperty("value");
         String predicate = relationship.getType().toString();

         // Limit the relationships that are shown to our own domain
         if (!start.startsWith("http://www.w3.org") && !end.startsWith("http://www.w3.org")) {
            // Check whether the relationship is inferred or not
            boolean inferred = (Boolean)relationship.getProperty("inferred",false);
            // Retrieve the more meaningful names
            start = getName(start);
            end = getName(end);
            predicate = getName(predicate);
            // Push the start and end nodes (they will only be created once)
            PushUtility.pushNode(start);
            PushUtility.pushNode(end);
            PushUtility.pushEdge(id++, start, end, predicate, inferred);
         }
      }
   }

   ...

}

 

2. Pushing the (inferred) RDF data

The streaming plugin for Gephi allows reading and visualizing data that is send to its master server. This master server is a REST interface that is able to receive graph data through a JSON interface. The PushUtility used in the PushTransactionEventHandler is responsible for generating the required JSON edge and node data format and pushing it to the Gephi master.

public class PushUtility {

   private static final String url = "http://localhost:8080/workspace0?operation=updateGraph";
   private static final String nodejson = "{\"an\":{\"%1$s\":{\"label\":\"%1$s\"}}}";
   private static final String edgejson = "{\"ae\":{\"%1$d\":{\"source\":\"%2$s\",\"target\":\"%3$s\",\"directed\":true,\"label\":\"%4$s\",\"inferred\":\"%5$b\"}}}";

   private static void push(String message) {
      try {
         // Create a connection and push the node or edge json message
         HttpURLConnection con = (HttpURLConnection) new URL(url).openConnection();
         con.setRequestMethod("POST");
         con.setDoOutput(true);
         con.getOutputStream().write(message.getBytes("UTF-8"));
         con.getInputStream();
      }
      catch(Exception e) {
         e.printStackTrace();
      }
   }

   // Pushes a node
   public static void pushNode(String label) {
      push(String.format(nodejson, label));
   }

   // Pushes an edge
   public static void pushEdge(int id, String source, String target, String label, boolean inferred) {
      push(String.format(edgejson, id, source, target, label, inferred));
      System.out.println(String.format(edgejson, id, source, target, label, inferred));
   }

}

 

3. Visualizing the (inferred) RDF data

Start the Gephi Streaming Master server. This will allow Gephi to receive the (inferred) RDF triples that we send it through its REST interface. Let’s run our Java application and add the following RDF triples:

http://datablend.be/example/teaches http://www.w3.org/2000/01/rdf-schema#domain http://datablend.be/example/teacher
http://datablend.be/example/teaches http://www.w3.org/2000/01/rdf-schema#range http://datablend.be/example/student
http://datablend.be/example/Davy http://datablend.be/example/teaches http://datablend.be/example/Bob

 

The first two RDF triples above state that a teacher teaches a student. The last RDF triple states that Davy teaches Bob. As a result, the RDF Schema inferencer deducts that Davy must be a teacher and that Bob must be a student. Let’s have a look at what Gephi visualized for us.

gephi

Mmm … That doesn’t really look impressive :-) . Let’s use some formatting. First apply Force Atlas lay-outing. Afterwards, scale the edges and enable the labels on both the edges and the nodes. Finally, apply partitioning on the edges by coloring the arrows using the inferred property on the edges. We can now clearly identify the inferred RDF statements (i.e. Davy being a teacher and Bob being a student).

gephi

 

Let’s add some additional RDF triples.

http://datablend.be/example/teacher http://www.w3.org/2000/01/rdf-schema#subClassOf http://datablend.be/example/person
http://datablend.be/example/student http://www.w3.org/2000/01/rdf-schema#subClassOf http://datablend.be/example/person

 

Basically, these RDF triples state that both teacher and student are subclasses of person. As a result, the RDFS inferencer is able to deduct that both Davy and Bob must be persons. The Gephi visualization is updated accordingly.

gephi

 

 

4. Conclusion

With just a few lines of code we are able to stream (inferred) RDF triples to Gephi and make use of its powerful visualization and analysis tools to explore and inspect our datasets. As always, the complete source code can be found on the Datablend public GitHub repository. Make sure to surf the internet to find some other nice Gephi streaming examples, the coolest one probably being the visualization of the Egyptian revolution on Twitter.

 

Source: http://datablend.be/?p=1146

It’s easier than you think to extend DevOps practices to SQL Server with Redgate tools. Discover how to introduce true Database DevOps, brought to you in partnership with Redgate

Topics:

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}