Over a million developers have joined DZone.

How to Do Graph Analysis on PostgreSQL With Arcade

DZone's Guide to

How to Do Graph Analysis on PostgreSQL With Arcade

Explore Arcade Analytics' beta version RDBMS connector, which allows you to visually inspect relationships and connections in your RDBMS and treat your data as a graph.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Graph visualization and analysis are critical tools to have in your toolkit. Developers, analysts, business executives, and really anyone that uses data can use graph visualization tools to extract information from data and see how the data interacts. It is one thing to know that something exists; it is another to see how it affects and is affected by the things around it.

In today’s market, there are several graph visualization tools that are able to connect to various graph databases. Most people find that issues arise when they do not have access to a graph database but want to use a graph visualization tool. Most graph visualization tools do not have the ability to integrate with a relational database, the most commonly used database. One solution is migration. Companies can spend massive amounts of money to move all of their data from a relational database to a graph database. However, for many people, this is not a solution. Migrations are expensive and time-consuming and many people do not want to deal with the headache that it can bring.

If you are in this situation, an attractive alternative solution is Arcade Analytics. Arcade Analytics is a graph visualization tool that enables users to have more control over their data. It sits on top of the user’s database and allows the users to query data and show it in a graph. One of the most attractive features of Arcade Analytics is that it allows users to query data from a relational database and visualize the relational database as a graph. Arcade’s RDBMS connector allows users to perform a graph analysis over your RDBMS without any migration and with few simple steps.

To understand how this is possible, let’s explore the RDBMS Connector. 

How Does It Work?

You can visually inspect relationships and connections within your RDBMS and treat your data as a graph.

The key to achieving that is the model mapping between the source data model, the entity-relationship model (ER), and the target data model, the graph model.

Once a coherent and effective model mapping is performed (don’t worry, that’s a completely automated process), you can query your source dataset and play with it as if it were a graph:

  • Each record is transformed.
  • Each connection between two records, inferred through a relationship between two tables and computed through a join operation, generates an edge.

Image title

The ER model is built starting from the source DB schema: each table, known also as an Entity, and each Relationship in the DB is inferred from the metadata.

This automated mapping strategy adopts a basic approach: the source DB schema is directly translated as follows:

  1. Each Entity in the source DB is converted into a vertex type.
  2. Each Relationship between two Entities in the source DB is converted into an edge type.

All the records of each table are handled according to this schemas-mapping: each pair of records on which it’s possible to perform a join operation will correspond to a pair of vertices connected by an edge of a specific Edge Type.

Arcade allows connections to Oracle, SQLServer, MySQL, PostgreSQL, and HyperSQL. To show you how the tool works, I chose a sample relational database. We will connect to a PostgreSQL server and we will perform a graph analysis on the DVD Rental database. To learn more about this specific dataset, click here.

Here is the source database schema (AKA the ER Model).

Image title

According to the mapping rules stated above, the following correspondent graph model will be automatically built from the tool.

Image title

Really simple, isn’t it?! Still some doubts? Okay, let’s see a couple of mapping examples.

1-N Relationship Between Film and Language

The Film and Language source tables are translated in two correspondent vertex types. The properties contained in the Film and Language vertex types directly come from the columns belonging to the two source tables.

The logical relationship between the two tables generates the HasLanguage edge type.

Image title

N-N Relationship Between Actors

In RDBMSs, N-N relationships are expressed through join tables. In the sample schema below, you can see how the N-N relationship between actors and movies is modeled through the join table Film_Actor.

Image title

As you can see, also the central join table is translated into a specific vertex type, allowing you to traverse the N-N relationship between actors and movies.

For this reason, traversing N-N relationships is equivalent to traverse two 1-N relationships, like in the relational world:

  • 1-N relationship between Film and Film_Actor
  • 1-N relationship between Film_Actor and Film

Now, let’s start to play with data!

Retrieving Data and Performing Our First Analysis

Once connected to the source relational database, we will get a new empty widget, like the following.

Image title

First, we will retrieve some data. We have a few main ways to do that:

Text search bar:

Image title

Typing a query against our source relational database:

Image title

We will start with a specific actor, retrieving his information through a full-text search on the name. Then, we will expand all his connections present in the source database. For the purposes of this example, I will search for Christian Gable.

Image title

Let’s load the vertex by clicking the designated button. Below, you can see the new vertex: to inspect its content, we just have to open the Graph Element menu.

Image title

Now, we can start our analysis: let’s suppose we want to find all the customers who rented a movie where Christian Gable performed as an actor. We can simply do that by navigating all the relationships of our graph model by using the Traverse menu.

First, we have to retrieve all the vertices of the film_actor join table. Doing this will also allow us to fetch the film vertices.

Image title

Image title

Now, starting from the movies, we can expand all the 270 ongoing relationships.

Image title

Image title

We have now retrieved all the vertices connected to each specific movie — in this case, the has_film edge type is mapped with three different relationships:

  1. inventory film
  2. film_category (join table) → film
  3. film_actor (join table) → film

In order to find all the customers who rented all these movies, we should focus just on the inventory → film relationship, so we can get rid of useless vertices belonging to film_category class and the new film_actor nodes by deleting them (click the Class button on the right-hand side to select the desired nodes and use the Delete button in the sidebar).

Image title

Image title

Now we can navigate the ingoing has_inventory edges, connecting inventory vertices with those belonging to the rental class.

Image title

We arrive at the last connection: from each rental vertex, we can reach a specific customer by expanding the outgoing has_customer relationship.

Image title

Here we are. We have found all the customers who rented a movie where they could see a wonderful performance by the great Christian Gable!

Now, what if we want to narrow our analysis to a subset of customers according to a specific movie of Christian Gable?

Very simple: we can select a specific film and play with the selection of the ingoing and outgoing elements till a certain depth (Selection menu), as shown in the following screenshots.

Image title

Image title

Image title

At this point, we can delete all the rest of the graph by inverting the current selection through the Invert operation.

Image title

Image title

Then, we can delete them all with a click... and here is the result.

Image title

This RDBMS Connector is just a beta version. There are some limitations, i.e. if you didn’t define constraints such as foreign keys between the tables on which you usually perform join operations, you would lose this kind of information during the querying process.

Because of this, if foreign keys are missing, you will not have any edges in your final Graph Model, and you will not able to traverse any relationship.

To overcome these limitations, the Arcade team is working on a visual tool mapper, allowing users to edit the basic mapping. This way, you will be able to add connections to your dataset by defining new edge types between vertices.

In this post, we had just a taste of the analyses we can perform over a relational database thanks to Arcade, new features will be shown in next posts.

I hope this post was helpful and interesting.

Stay tuned!

To play with this data yourself, click here to access Arcade's online demo.

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

rdbms ,data visualization ,big data ,graph database ,data analytics

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}