Working with Graph Data from Neo4j in QlikView
There are numerous examples of problems which can be handled efficiently by graph databases. A graph is made up of nodes and relationships between nodes (or vertices and edges): http://en.wikipedia.org/wiki/Graph_database
Now we can use graph data in a business intelligence / business discovery solution like QlikView to do some more business related analytics.
Neo4j is a high-performance, NoSQL graph database with all the features of a mature and robust database. It is an open source project supported by Neo Technology, implemented in Java. You can read more about it here: http://neo4j.org
Here are some slides about graph problems and use cases for Neo4j: http://www.slideshare.net/peterneubauer/neo4j-5-cool-graph-examples-4473985
Since the Neo4j JDBC driver is available we can use the QlikView JDBC Connector from TIQ Solutions and Cypher - a declarative graph query language - for expressive and efficient querying of the graph data. Take a look into the Cypher documentation to understand the syntax of this human query language, because it is totally different from SQL: http://docs.neo4j.org/chunked/1.7/cypher-query-lang.html
After connecting the Neo4j graph database with this command:
CUSTOM CONNECT TO “Provider=JDBCConnector_x64.dll;jdbc:neo4j://localhost:7474/?connector.driverClass=org.neo4j.jdbc.Driver;XUserId=MfJbFYD;XPassword=IYRXBVD;”;
I have used the following Cypher queries in the QlikView load script.
1. Read all nodes and some of the node’s properties:
SQL START n=node(*)
RETURN ID(n) as NodeID, n as NodeText, n.__type__? as NodeType, n.name? as NodeName, n.biography? as Biography, n.birthplace? as Birthplace, n.birthday? as Birthday, n.title? as MovieTitle, n.releaseDate? as ReleaseDate;
2. Read all relationships and some of the relationship’s properties:
SQL START r=relationship(*)
RETURN ID(r) as RelID, r as RelText, r.__type__? as RelType,
r.name? as RelName, type(r) as RelLabel;
3. Read all paths, a sequence of nodes and relationships that always start and end in nodes:
SQL START n=node(*)
RETURN ID(r) as RelID, ID(n) as Node1_ID, Type(r) as Label, ID(m) as Node2_ID;
Then build up a QlikView data model in the load script with some tables. Basically the paths would be needed in one table (Graph):
The result is an interactive QlikView analytics application where you can use a wide range of UI elements to disctover relevant information from the graph. In the next screenshot you see all related informations to the Star Trek movies:
I use the Google Chart API for the graph visualization in QlikView.
In a next step this could be replaced by a cool interactive QlikView
In the next step we will start a search for Mr. Spock by using QlikView’s incredible associative search feature:
We will find all roles of Mr. Spock with different actors and can display the related sub-graph:
Furthermore we can do some business analytics now, for instance we can create a chart on the count of actors by movie and the average actors age per movie at the movie release date (note: the data are incomplete however it gives an indication). You can see now the films with the youngest artists on the left side:
I hope this gives you some imaginations on how to use graph data in business situation with an interactive front end. This example is bridging several worlds:
- Graph Data to Relational Data,
- NoSQL to BI and
- Java OSS to proprietary!
I’m curious what kind of graph-business-discovery solutions will come up with this promising combination.
Download the Neo4j QlikView example: QVNeo4j.qvw
Used Neo4j sample data: http://example-data.neo4j.org/files/cineasts_movies_actors.zip