How to Do Graph Analysis on PostgreSQL With Arcade
How to Do Graph Analysis on PostgreSQL With Arcade
Explore Arcade Analytics' beta version RDBMS connector, which allows you to visually inspect relationships and connections in your RDBMS and treat your data as a graph.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Graph visualization and analysis are critical tools to have in your toolkit. Developers, analysts, business executives, and really anyone that uses data can use graph visualization tools to extract information from data and see how the data interacts. It is one thing to know that something exists; it is another to see how it affects and is affected by the things around it.
In today’s market, there are several graph visualization tools that are able to connect to various graph databases. Most people find that issues arise when they do not have access to a graph database but want to use a graph visualization tool. Most graph visualization tools do not have the ability to integrate with a relational database, the most commonly used database. One solution is migration. Companies can spend massive amounts of money to move all of their data from a relational database to a graph database. However, for many people, this is not a solution. Migrations are expensive and time-consuming and many people do not want to deal with the headache that it can bring.
If you are in this situation, an attractive alternative solution is Arcade Analytics. Arcade Analytics is a graph visualization tool that enables users to have more control over their data. It sits on top of the user’s database and allows the users to query data and show it in a graph. One of the most attractive features of Arcade Analytics is that it allows users to query data from a relational database and visualize the relational database as a graph. Arcade’s RDBMS connector allows users to perform a graph analysis over your RDBMS without any migration and with few simple steps.
To understand how this is possible, let’s explore the RDBMS Connector.
How Does It Work?
You can visually inspect relationships and connections within your RDBMS and treat your data as a graph.
Once a coherent and effective model mapping is performed (don’t worry, that’s a completely automated process), you can query your source dataset and play with it as if it were a graph:
- Each record is transformed.
- Each connection between two records, inferred through a relationship between two tables and computed through a join operation, generates an edge.
The ER model is built starting from the source DB schema: each table, known also as an Entity, and each Relationship in the DB is inferred from the metadata.
This automated mapping strategy adopts a basic approach: the source DB schema is directly translated as follows:
- Each Entity in the source DB is converted into a vertex type.
- Each Relationship between two Entities in the source DB is converted into an edge type.
All the records of each table are handled according to this schemas-mapping: each pair of records on which it’s possible to perform a join operation will correspond to a pair of vertices connected by an edge of a specific Edge Type.
Arcade allows connections to Oracle, SQLServer, MySQL, PostgreSQL, and HyperSQL. To show you how the tool works, I chose a sample relational database. We will connect to a PostgreSQL server and we will perform a graph analysis on the DVD Rental database. To learn more about this specific dataset, click here.
Here is the source database schema (AKA the ER Model).
According to the mapping rules stated above, the following correspondent graph model will be automatically built from the tool.
Really simple, isn’t it?! Still some doubts? Okay, let’s see a couple of mapping examples.
1-N Relationship Between Film and Language
The Film and Language source tables are translated in two correspondent vertex types. The properties contained in the Film and Language vertex types directly come from the columns belonging to the two source tables.
The logical relationship between the two tables generates the HasLanguage edge type.
N-N Relationship Between Actors
In RDBMSs, N-N relationships are expressed through join tables. In the sample schema below, you can see how the N-N relationship between actors and movies is modeled through the join table Film_Actor.
As you can see, also the central join table is translated into a specific vertex type, allowing you to traverse the N-N relationship between actors and movies.
For this reason, traversing N-N relationships is equivalent to traverse two 1-N relationships, like in the relational world:
- 1-N relationship between Film and Film_Actor
- 1-N relationship between Film_Actor and Film
Now, let’s start to play with data!
Retrieving Data and Performing Our First Analysis
Once connected to the source relational database, we will get a new empty widget, like the following.
First, we will retrieve some data. We have a few main ways to do that:
We will start with a specific actor, retrieving his information through a full-text search on the name. Then, we will expand all his connections present in the source database. For the purposes of this example, I will search for Christian Gable.
Let’s load the vertex by clicking the designated button. Below, you can see the new vertex: to inspect its content, we just have to open the Graph Element menu.
Now, we can start our analysis: let’s suppose we want to find all the customers who rented a movie where Christian Gable performed as an actor. We can simply do that by navigating all the relationships of our graph model by using the Traverse menu.
First, we have to retrieve all the vertices of the film_actor join table. Doing this will also allow us to fetch the film vertices.
Now, starting from the movies, we can expand all the 270 ongoing relationships.
We have now retrieved all the vertices connected to each specific movie — in this case, the has_film edge type is mapped with three different relationships:
- inventory → film
- film_category (join table) → film
- film_actor (join table) → film
In order to find all the customers who rented all these movies, we should focus just on the inventory → film relationship, so we can get rid of useless vertices belonging to film_category class and the new film_actor nodes by deleting them (click the Class button on the right-hand side to select the desired nodes and use the Delete button in the sidebar).
Now we can navigate the ingoing has_inventory edges, connecting inventory vertices with those belonging to the rental class.
We arrive at the last connection: from each rental vertex, we can reach a specific customer by expanding the outgoing has_customer relationship.
Here we are. We have found all the customers who rented a movie where they could see a wonderful performance by the great Christian Gable!
Now, what if we want to narrow our analysis to a subset of customers according to a specific movie of Christian Gable?
Very simple: we can select a specific film and play with the selection of the ingoing and outgoing elements till a certain depth (Selection menu), as shown in the following screenshots.
At this point, we can delete all the rest of the graph by inverting the current selection through the Invert operation.
Then, we can delete them all with a click... and here is the result.
This RDBMS Connector is just a beta version. There are some limitations, i.e. if you didn’t define constraints such as foreign keys between the tables on which you usually perform join operations, you would lose this kind of information during the querying process.
Because of this, if foreign keys are missing, you will not have any edges in your final Graph Model, and you will not able to traverse any relationship.
To overcome these limitations, the Arcade team is working on a visual tool mapper, allowing users to edit the basic mapping. This way, you will be able to add connections to your dataset by defining new edge types between vertices.
In this post, we had just a taste of the analyses we can perform over a relational database thanks to Arcade, new features will be shown in next posts.
I hope this post was helpful and interesting.
To play with this data yourself, click here to access Arcade's online demo.
Opinions expressed by DZone contributors are their own.