Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Graph Visualization for Neo4j Schemas Using yFiles

DZone's Guide to

Graph Visualization for Neo4j Schemas Using yFiles

Need a way to effectively visualize your Neo4j data schemas? We take a look at how to do just that making use of yFiles. Read on for details.

· Database Zone
Free Resource

Download the Guide to Open Source Database Selection: MySQL vs. MariaDB and see how the side-by-side comparison of must-have features will ease the journey. Brought to you in partnership with MariaDB.

I have been working with graph visualizations for almost 20 years now, but only recently have I begun looking into graph databases.

Shortly after I got introduced to Neo4j, I found that when looking at existing dataset examples, I often felt the need to look at and better understand the underlying schema of the data. Although a Neo4j database does not need a schema, most of the time data will adhere to a schema and without one, creating elegant and efficient queries to gain insight into your database becomes rather difficult.

I spoke with other Neo4j users, and they told me that they had come across the same problem. In larger projects, there should be a separate documentation about the database schema, but as it is the case very often with documentation, either it doesn’t exist or it is out of sync with reality.

Getting the Schema

You don’t need an up-to-date documentation to take a look at the schema. There are existing solutions that can help you here:

The built-in Neo4j Browser will show you a list of all node labels, all relationship types and property keys currently in use in the sidebar. Clicking on each of them, the Neo4j Browser will sample a few random nodes and relationships and render them to the screen. You can then interactively explore the actual graph data and build the schema in your head or using pen and paper.

If you have installed the APOC tools at your server, you can make use of the awesome meta-graph APOC procedure to automate this sampling: The procedure will create results that the Neo4j Browser can actually display as a graph.

Some people may not have the APOC tools installed or cannot install them at their server. Luckily, starting with Neo4j 3.1, there is similar functionality built right into the database. Sending a call db.schema() query to the database, you will get a response that looks like an ordinary query result with nodes and relationships; however, the entities are purely virtual and do not exist in the database.

For very small instances with a simple schema, this may already be enough for you to get a good understanding of the structure of your database. However, while the current implementation of the graph viewer in the Neo4j Browser is fine for displaying smaller result sets, it does not work very well for non-trivial schemata.

Database schema visualization in the Neo4j BrowserIf you start looking more closely at the output of the db.schema() call, you will see that there are some node labels that seem to be connected to a large number of other labels. In the example above there are labels like StackOverflowGitHubTwitterMeetup, and User which have a very high connectivity.

It is their adjacent relationships that make the diagram almost unusable. Now if you look at the database contents you will quickly see that in reality there is not a single node in the database that is labeled with just one of these labels: 

MATCH (n:StackOverflow) WITH labels(n) as labels WHERE length(labels) = 1
RETURN collect(distinct(labels))

This will give you an empty list since there is no combination of labels in the whole database that only consists of the “StackOverflow” label! Instead, the label is used as a “tag” in combination with other labels only.

This Cypher query can be used to find the combinations in use for a certain label in the database:

MATCH (n:StackOverflow) WITH labels(n) as labels RETURN
collect(distinct(labels))

With this knowledge, in order to better understand the schema, we can actually manually remove those “tagging” nodes from the graph display. We won’t lose any relevant schema information since the relationships are still there for the other labels that the tagging label was used in combination with.

Be careful though, because there can be labels that cannot be removed without destroying information. In this case the User tag appeared in these combinations:

[User, Twitter], [User, StackOverflow], [User, Meetup], [User, GitHub],
[User, Slack]

And if we remove the User label too, we will remove all Users and their interactions and relationships from the schema.

Actually, the schema should better show a different structure: It now shows a type of “User” who posts questions on Stack Overflow, tweets about his or her work on Twitter, and meets his or her fellow developers at Meetups. We see a single User that participates in all of these relationships. However, if we look at the actual data, we will see that only Twitter Users tweet, and only Stack Overflow Users post questions and answers on Stack Overflow.

Thus, in reality, in the database, the schema should be drawn with separate types of Users. One for each “tagging” label combination that they appear in.

At this point, it becomes clear that the current implementation of the graph visualization in the Neo4j Browser does not suffice for rendering more complex database schemata.

Graph Visualization to the Rescue!

As a programmer, I quickly became annoyed by manually entering the above Cypher queries into the Neo4j Browser. Immediately after I found out about the great JavaScript Bolt driver for Neo4j, I decided to use yFiles for HTML to build my own schema viewer that will perform all of the above manual tasks automatically.

“yFiles” is a generic graph visualization, drawing and editing library for programmers that comes with the most complete suite of automatic layout algorithms. It also features extensive customization options, and as such, can be used to create completely new applications that exactly suit one’s requirements. Therefore, I was positive that I could easily build an application that allows users to quickly and efficiently browse and understand the schemata of even the most complex Neo4j graph databases.

I started with the same simple approach that is used in the Neo4j Browser. The Bolt driver returns easily consumable JavaScript objects that can quickly be turned into a graph that can be visualized with the yFiles library:

session
    .run("call db.schema()", {})
    .then(function(result){
      const records = result.records;

      const graphBuilder = new yfiles.binding.GraphBuilder(graphComponent.graph);
      graphBuilder.nodesSource = records[0];
      graphBuilder.edgesSource = records[1];

      // helper method to convert the neo4j "long" ids, to a simple JavaScript object (string)
      function getId(identity) {
        return identity.low.toString() + ":" + identity.high.toString();
      }

      graphBuilder.nodeIdBinding = node => getId(node.identity);
      graphBuilder.sourceNodeBinding = edge => getId(edge.start);
      graphBuilder.targetNodeBinding = edge => getId(edge.end);
      graphBuilder.nodeLabelBinding = node => node.labels && node.labels.length > 0 ? node.labels[0] : null;

      graphBuilder.buildGraph();

      session.close();
    })
    .catch(function(error) {
      console.log(error);
    });

That was all I had to do to get the basic schema to display in my own application! I just plugged the above code into one of the samples in the getting started tutorials for yFiles for HTML and was immediately able to interactively explore and finally understand my database schema!

Creating a Schema Explorer

Of course, I didn’t stop at this point. I was excited to see what one can create when the power of Neo4j and the yFiles libraries is used together in the same application. So, I added an option in the context menu for the user to automatically split node labels into all of their label combinations and update the relationships accordingly.

A Cypher query like the following quickly reveals that for certain label combinations there are a lot fewer relationships, and they suddenly begin to make sense:

MATCH (n:User:StackOverflow)-[r]->(n2) RETURN collect(distinct(type(r))),
labels(n2)

collect(distinct(type(r)))    labels(n2)
[POSTED]                      [Content, Question, StackOverflow]
[POSTED]                      [Content, Answer, StackOverflow]

So, Stack Overflow users in the database do not participate in a meetup and do not create GitHub repositories; they will post answers and questions and the schema should reflect that!

So after reading in the schema, splitting the node labels, reinserting the right relationships, and removing the tagging labels from the schema view, I finally applied some custom styling and one of the automatic layout algorithms and got this much-improved schema:

Learn how to do graph visualization for Neo4j schemas using Cypher and yFiles for HTMLNeo4j Browser graph visualization schema explorer

  • To start exploring the database, we added a convenient search dialog with preview functionality from which the user can import one or more nodes into the visualization.
  • My favorite feature is the ability to use the schema to interactively specify transitive relationships over multiple relationship hops: The explorer will then enable the exploration of “virtual” relationships that don’t explicitly exist in the database.
  • And for the pro-users, we also added the option to directly specify Cypher queries. With the right query, you can even visualize virtual nodes and relationships!
  • Neo4j Browser database schema explorerUnderstanding and Visualizing Complex Neo4j Instances.”

    And if you want to develop a similar application, you can evaluate yFiles for HTML today and use Cypher, the JavaScript Bolt driver, your Neo4j database, and the code samples from this post to get started!

    Happy diagramming!

    Interested in reducing database costs by moving from Oracle Enterprise to open source subscription?  Read the total cost of ownership (TCO) analysis. Brought to you in partnership with MariaDB.

    Topics:
    neo4j ,data visaulization ,graphs ,database ,schemas ,yfiles ,tutorial

    Published at DZone with permission of Sebastian Müller. See the original article here.

    Opinions expressed by DZone contributors are their own.

    THE DZONE NEWSLETTER

    Dev Resources & Solutions Straight to Your Inbox

    Thanks for subscribing!

    Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

    X

    {{ parent.title || parent.header.title}}

    {{ parent.tldr }}

    {{ parent.urlSource.name }}