Visualizing RDF Schema inferencing through Neo4J, Tinkerpop, Sail and Gephi
Join the DZone community and get the full member experience.
Join For Freelast week, the neo4j plugin for gephi was released. gephi is an open-source visualization and manipulation tool that allows users to interactively browse and explore graphs . the graphs themselves can be loaded through a variety of file formats. thanks to martin škurla , it is now possible to load and lazily explore graphs that are stored in a neo4j data store.
in one of my previous articles , i explained how neo4j and the tinkerpop framework can be used to load and query rdf triples. the newly released neo4j plugin now allows to visually browse these rdf triples and perform some more fancy operations such as finding patterns and executing social network analysis algorithms from within gephi itself. tinkerpop’s sail ouplementation also supports the notion of rdf schema inferencing . inferencing is the process where new (rdf) data is automatically deducted from existing (rdf) data through reasoning . unfortunately, the sail reasoner cannot easily be integrated within gephi, as the gephi plugin grabs a lock on the neo4j store and no rdf data can be added, except through the plugin itself.
being able to visualize the rdf schema reasoning process and graphically indicate which rdf triples were added manually and which rdf data was automatically inferred would be a nice to have. to implement this feature, we should be able to push graph changes from tinkerpop and neo4j to gephi. luckily, the gephi graph streaming plugin allows us to do just that. in the rest of this article, i will detail how to setup the required gephi environment and how we can stream (inferred) rdf data from neo4j to gephi.
1. adding the (inferred) rdf data
let’s start by setting up the required neo4j/tinkerpop/sail environment that we will use to store and infer rdf triples. the setup is similar to the one explained in my previous tinkerpop article . however, instead of wrapping our graphsail as a sailrepository , we will wrap it as a forwardchainingrdfsinferencer . this inferencer will listen for rdf triples that are added and/or removed and will automatically execute rdf schema inferencing, applying the rules as defined by the rdf semantics recommendation .
neograph = new neo4jgraph("var/rdf"); // let's use manual transaction mode neograph.setmaxbuffersize(0); sail = new forwardchainingrdfsinferencer(new graphsail(neograph)); sail.initialize(); connection = sail.getconnection();
we are now ready to add rdf triples. let’s create a simple loop that allows us to read-in rdf triples and add them to the sail store.
inferenceloop loop = new inferenceloop(); scanner in = new scanner(system.in); while (true) { system.out.println("provide rdf statement:"); system.out.print("=> "); string input = in.nextline(); system.out.println("the following edges were created:"); loop.inference(input); }
the inference method itself is rather simple. we first start by parsing the rdf subject , predicate and object . next, we start a new transaction, add the statement and commit the transaction. this will not only add the rdf triple to our neo4j store but will additionally run the rdf schema inferencing process and automatically add the inferred rdf triples. pretty easy!
// parses and add the rdf statement accordingly public void inference(string statement) throws sailexception, interruptedexception { string[] triple = statement.split(" "); inference(new uriimpl(triple[0]), new uriimpl(triple[1]), new uriimpl(triple[2])); } // add the inference public void inference(uri subject, uri predicate, uri object) throws sailexception, interruptedexception { neograph.starttransaction(); connection.addstatement(subject, predicate, object); connection.commit(); neograph.stoptransaction(transactionalgraph.conclusion.success); }
but how do we retrieve the inferred rdf triples that were added through the inference process? although the forwardchainingrdfsinferencer allows us to register a listener that is able to detect changes to the graph, it does not provide the required api to distinct between the manually added or inferred rdf triples . luckily, we can still access the underlying neo4j store and capture these graph changes by implementing the neo4j transactioneventhandler interface. after a transaction is committed, we can fetch the newly created relationships (i.e. rdf triples). for each of these relationships, the start node (i.e. rdf subject), end node (i.e. rfd object) and relationship type (i.e. rdf predicate) can be retrieved. in case a rdf triple was added through inference, the value of the boolean property “inferred” is “true” . we filter the relationships to the ones that are defined within our domain (as otherwise the full rdfs meta model will be visualized as well). finally we push the relevant nodes and edges.
public class pushtransactioneventhandler implements transactioneventhandler { private int id = 1; public void aftercommit(transactiondata transactiondata, object o) { // retrieve the created relationships. (the relevant nodes will be retrieved through these relationships) iterable relationships = transactiondata.createdrelationships(); // iterate and add for (relationship relationship : relationships) { // retrieve the labels string start = (string)relationship.getstartnode().getproperty("value"); string end = (string)relationship.getendnode().getproperty("value"); string predicate = relationship.gettype().tostring(); // limit the relationships that are shown to our own domain if (!start.startswith("http://www.w3.org") && !end.startswith("http://www.w3.org")) { // check whether the relationship is inferred or not boolean inferred = (boolean)relationship.getproperty("inferred",false); // retrieve the more meaningful names start = getname(start); end = getname(end); predicate = getname(predicate); // push the start and end nodes (they will only be created once) pushutility.pushnode(start); pushutility.pushnode(end); pushutility.pushedge(id++, start, end, predicate, inferred); } } } ... }
2. pushing the (inferred) rdf data
the streaming plugin for gephi allows reading and visualizing data that is send to its master server. this master server is a rest interface that is able to receive graph data through a json interface. the pushutility used in the pushtransactioneventhandler is responsible for generating the required json edge and node data format and pushing it to the gephi master.
public class pushutility { private static final string url = "http://localhost:8080/workspace0?operation=updategraph"; private static final string nodejson = "{\"an\":{\"%1$s\":{\"label\":\"%1$s\"}}}"; private static final string edgejson = "{\"ae\":{\"%1$d\":{\"source\":\"%2$s\",\"target\":\"%3$s\",\"directed\":true,\"label\":\"%4$s\",\"inferred\":\"%5$b\"}}}"; private static void push(string message) { try { // create a connection and push the node or edge json message httpurlconnection con = (httpurlconnection) new url(url).openconnection(); con.setrequestmethod("post"); con.setdooutput(true); con.getoutputstream().write(message.getbytes("utf-8")); con.getinputstream(); } catch(exception e) { e.printstacktrace(); } } // pushes a node public static void pushnode(string label) { push(string.format(nodejson, label)); } // pushes an edge public static void pushedge(int id, string source, string target, string label, boolean inferred) { push(string.format(edgejson, id, source, target, label, inferred)); system.out.println(string.format(edgejson, id, source, target, label, inferred)); } }
3. visualizing the (inferred) rdf data
start the gephi streaming master server. this will allow gephi to receive the (inferred) rdf triples that we send it through its rest interface. let’s run our java application and add the following rdf triples:
http://datablend.be/example/teaches http://www.w3.org/2000/01/rdf-schema#domain http://datablend.be/example/teacher http://datablend.be/example/teaches http://www.w3.org/2000/01/rdf-schema#range http://datablend.be/example/student http://datablend.be/example/davy http://datablend.be/example/teaches http://datablend.be/example/bob
the first two rdf triples above state that a teacher teaches a student . the last rdf triple states that davy teaches bob . as a result, the rdf schema inferencer deducts that davy must be a teacher and that bob must be a student . let’s have a look at what gephi visualized for us.
mmm … that doesn’t really look impressive
. let’s use some formatting. first apply
force atlas lay-outing
. afterwards, scale the edges and enable the labels on both the edges and the nodes. finally, apply
partitioning
on the edges by coloring the arrows using the
inferred
property on the edges. we can now clearly identify the inferred rdf
statements (i.e. davy being a teacher and bob being a student).
let’s add some additional rdf triples.
http://datablend.be/example/teacher http://www.w3.org/2000/01/rdf-schema#subclassof http://datablend.be/example/person http://datablend.be/example/student http://www.w3.org/2000/01/rdf-schema#subclassof http://datablend.be/example/person
basically, these rdf triples state that both teacher and student are subclasses of person . as a result, the rdfs inferencer is able to deduct that both davy and bob must be persons . the gephi visualization is updated accordingly.
4. conclusion
with just a few lines of code we are able to stream (inferred) rdf triples to gephi and make use of its powerful visualization and analysis tools to explore and inspect our datasets. as always, the complete source code can be found on the datablend public github repository . make sure to surf the internet to find some other nice gephi streaming examples, the coolest one probably being the visualization of the egyptian revolution on twitter .
source: http://datablend.be/?p=1146
Opinions expressed by DZone contributors are their own.
Comments