{{announcement.body}}
{{announcement.title}}

Comparing Graql to SQL — Part 1/2

DZone 's Guide to

Comparing Graql to SQL — Part 1/2

In this article, look at a comparison between SQL and Graql.

· Database Zone ·
Free Resource

Since the 1970s, SQL has been the de facto language to work with databases. As a declarative language, it's straightforward to write queries and build powerful applications. However, relational databases struggle when working with interconnected and complex data. When working with such data in SQL, challenges arise especially in the modelling and querying of the data.

Graql is the query language used in Grakn. Just as SQL is the standard query language in relational databases, Graql is Grakn's query language. Both SQL and Graql are declarative query languages that abstract away lower-level operations. Both are:

  • Languages that attempt be readable and understandable
  • Languages that attempt to enable asking questions at a higher-level
  • Languages where the system figures out how to do lower-level operations

In practical terms, this means the languages become accessible to groups of people who would have otherwise not been able to access them. In this article, while we look at specific common concepts, we focus on comparing and exploring the differences between the two languages.

In 1970, a paper was published by an Oxford-educated mathematician called Edgar Codd, known as "Ted", and in it, he introduced two languages - a relational algebra and a relational calculus to express extremely complex queries. When they came out, they were considered to be a strange kind of mathematical notation. To build these ideas out into a database management system, Ted created a research group called System R, based out of the IBM research facilities in San Jose.

Back then, databases were mainly based on navigational, network, and hierarchical models, where we needed to know the physical data layer before we could write a navigational plan to describe our query. Ted, however, saw the inherent complexity in this and wanted to make it easier to write database queries.

However, as Ted's ideas were based on mathematical notation and mathematical symbolism, they were difficult to understand and not very accessible to most people, so two System R members addressed this issue by creating a simple query language - SEQL. As this new language was based exclusively on English words, this became the breakthrough that made it so much easier for people to understand the simplicity of Ted's ideas.

By the late 1970s, relational databases had grown in popularity, and the world came to accept just how superior SQL and the relational model were to its predecessors. The story since then is well known - relational databases have become the standard for building software as the world was ushered into the digital revolution.

In understanding Graql, it's useful to look at the underlying ideas that created SQL, as they are conceptually closely related. The essence of both Graql and SQL can be summarised as such:

  1. A language that can be read and understood intuitively. We say a language fulfils these criteria when it appears simple, maintainable and has a degree of similarity to natural text.
  2. A language that enables asking questions at a higher-level. Here we refer to a language that allows the user to describe operations at a new and higher semantic level.
  3. A language where the system figures out how to do lower-level operations. As the user describes higher-level operations, the system takes care of operations without the user having to think of them.

In this sense, both SQL and Graql are languages that abstract away lower-level operations. In practical terms, this means the languages become accessible to groups of people who would have otherwise not been able to access them. This means they become enabled to create value, while those who could already use them can now do things much faster. A similar thing can be said about Python, for example, a high level programming language that has enabled millions of programmers to build software without having to worry about lower-level operations that are abstracted away.

First, let's look at how data modeling compares between SQL and Graql. We use the Entity Relationship Diagram (ER Diagram) as it's the most common modeling tool in use. A basic model is composed of entity types and the relationships that can exist between them. Below is an example ER Diagram. We call this the conceptual model.

ER Diagram Example. Squares are entities, diamonds are relations, and circles are attributes.


If we are implementing this model in a relational database, we first go through a normalization process. We begin at First Normal Form (1NF) and by looking for things such as functional dependencies and transitive dependencies, we eventually get to our desired Third Normal Form (3NF).

2nf normalization and 3nf normalization

After this normalization process, we get to our logical model in 3NF and implement it in a relational database. We have gone from our conceptual model (ER diagram) to the logical model (3NF), without ever needing to go down to the physical level of the database. This was precisely the breakthrough that the relational model brought us - abstracting away the physical level. We call this the physical independence of data.

physical independence of data diagram

Now let's look at how this compares to Graql. We can map any ER Diagram directly to how we implement it in Graql, which means we don't need to go through a normalization process. Below we can see how a specific part of the earlier ER Diagram is modeled. We avoid the need to do any normalization, as Graql enables us to create a direct mapping of the ER Diagram with entities, relations, attributes, and roles to how we are implementing it later in code. This is different to SQL, where we need to impose a tabular structure over our model as a logical layer (as described above).

direct mapping diagram

This means we entirely skip out the normalization process required in SQL, and we keep working at the conceptual model. In other words, Graql abstracts away both the logical and physical model. In this sense, where SQL gave us the physical independence of data, Graql gives us the logical independence of data.

abstraction over logical model

Now let's look at some real data. Anyone who has studied SQL is probably familiar with the Northwind dataset. It contains sales data for Northwind Traders, a fictitious specialty foods export-import company.

northwind dataset

How do we go about defining the products table shown above in Graql and SQL? Below we see the Graql syntax that defines the product entity, and the corresponding relation. This also shows the SQL statements that create the new table and the corresponding attributes.

Java
 




x
12


 
1
define 
2
product sub entity, 
3
  key product-id, 
4
  has product-name, 
5
  has quantity-per-unit, 
6
  plays product-assignment; 
7
product-id sub attribute, datatype double; 
8
product-name sub attribute, datatype string; 
9
quantity-per-unit sub attribute, datatype double;
10
assignment sub relation, 
11
  relates assigned-category, 
12
  relates product-assignment;



SQL
 




xxxxxxxxxx
1


 
1
CREATE TABLE products (
2
  product_id smallint NOT NULL PRIMARY KEY,
3
  product_name character varying(40) NOT NULL,
4
  category_id smallint,
5
  quantity_per_unit character varying(20),
6
  FOREIGN KEY (category_id) REFERENCES categories
7
);



A few important points:

  • Here we can see that the SQL table has three attributes, each with their own datatype, which we can define in Graql as well. One of these attributes is a primary key, which we define in Graql using the key keyword.
  • In the SQL statement, there is also a foreign key, which depending on our model, we model as a related relation in Graql. We do this by connecting the product entity to the assignment relation using the role product-assignment.
  • In Graql, there is no concept of null values. If a concept does not have an attribute, it really does not have it. This is because in a graph context a null attribute is simply omitted from the graph.
  • Finally, an important point is that in the Graql model, attributes are first-class citizens, unlike in SQL.
In Part 2,  we will look at how to read/write data and how we should model at a higher-level in Graql leveraging the Hypergraph and Automated Reasoning.
Topics:
ai ,database ,grakn ,graql ,relational database ,sql ,sql comparison

Published at DZone with permission of Tomas Sabat , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}