Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Populating GRAKN.AI With the World

DZone's Guide to

Populating GRAKN.AI With the World

This post shows how to move data from a SQL database to GRAKN.AI as well as how to use the GRAKN.AI visualizer to explore the data.

· Database Zone ·
Free Resource

Compliant Database DevOps and the role of DevSecOps DevOps is becoming the new normal in application development, and DevSecOps is now entering the picture. By balancing the desire to release code faster with the need for the same code to be secure, it addresses increasing demands for data privacy. But what about the database? How can databases be included in both DevOps and DevSecOps? What additional measures should be considered to achieve truly compliant database DevOps? This whitepaper provides a valuable insight. Get the whitepaper

Highly interconnected data from complex domains is challenging to model, maintain, and query. With GRAKN.AI, which is an open-source distributed knowledge base (think a graph database with an extra punch), building intelligent systems becomes dramatically easier.

If you are thinking of trying GRAKN.AI, the chances are that you already have some data and want to see what our stack can do. This post shows how to move data from a SQL database to GRAKN.AI, and how to use our visualizer to explore the data.

I’m going to use a simple, well-known, example set of data about cities and countries of the world. In the course of this post, I will go through the basics of how to set up a SQL database and make some simple queries (which you can omit if you’re already a SQL user), then I will explain how to migrate the contents of the database into GRAKN.AI. I’ll use our declarative query language, Graql, to make the same queries as shown in SQL. You can find the code for this example on GitHub.

I have used MySQL for this example, although the Grakn Labs team has also tested with Oracle and PostgreSQL. The version of GRAKN.AI I used was 0.12.1 but, as always, I recommend that you use the latest version.

Setting Up MySQL

If you already have a MySQL setup, you may want to skip ahead to “Hello World” section.

Getting MySQL installed is relatively straightforward. I followed these instructions for MacOS X (downloading the installation package here). It wasn’t totally clear how to start the MySQL Server (I eventually worked out that installation puts an icon in your System Preferences menu). Having started the server, you need to start the MySQL shell from the terminal

mysql -u root -p

On first use, change the temporary password it was installed with (I set my new password to root, which is the same as the username, so it was memorable).

ALTER USER ‘root’@’localhost’ IDENTIFIED BY ‘root’;

“Hello World”

Once I had MySQL set up, it was time to open the example world database from the command line, as described here.

SOURCE [sql-path]/world.sql;

It is a classic example dataset published by MySQL that is easily understood and is commonly used by beginners to introduce them to SQL queries (however, I cannot guarantee that the data it contains is accurate).

Using the following query to retrieve information about the columns of the city table:

DESCRIBE city;
+-------------+----------+------+-----+---------+----------------+
| Field       | Type     | Null | Key | Default | Extra          |
+-------------+----------+------+-----+---------+----------------+
| ID          | int(11)  | NO   | PRI | NULL    | auto_increment |
| Name        | char(35) | NO   |     |         |                |
| CountryCode | char(3)  | NO   | MUL |         |                |
| District    | char(20) | NO   |     |         |                |
| Population  | int(11)  | NO   |     | 0       |                |
+-------------+----------+------+-----+---------+----------------+

As you can see, there are 5 columns in the table (confusingly shown as rows in the output): ID, Name, CountryCode, District, and Population. The ID is the Primary Key for the table, which uniquely identifies each record. This will be important later when we migrate to Grakn.

The following statement gives us a peek at the first 10 items of city data:

SELECT * from city LIMIT 10;
+----+----------------+-------------+---------------+------------+
| ID | Name           | CountryCode | District      | Population |
+----+----------------+-------------+---------------+------------+
|  1 | Kabul          | AFG         | Kabol         |    1780000 |
|  2 | Qandahar       | AFG         | Qandahar      |     237500 |
|  3 | Herat          | AFG         | Herat         |     186800 |
|  4 | Mazar-e-Sharif | AFG         | Balkh         |     127800 |
|  5 | Amsterdam      | NLD         | Noord-Holland |     731200 |
|  6 | Rotterdam      | NLD         | Zuid-Holland  |     593321 |
|  7 | Haag           | NLD         | Zuid-Holland  |     440900 |
|  8 | Utrecht        | NLD         | Utrecht       |     234323 |
|  9 | Eindhoven      | NLD         | Noord-Brabant |     201843 |
| 10 | Tilburg        | NLD         | Noord-Brabant |     193238 |
+----+----------------+-------------+---------------+------------+

Let’s get the information about a specific city: Sydney, Australia.

SELECT * FROM City WHERE Name = 'Sydney';
+-----+--------+-------------+-----------------+------------+
| ID  | Name   | CountryCode | District        | Population |
+-----+--------+-------------+-----------------+------------+
| 130 | Sydney | AUS         | New South Wales |    3276207 |
+-----+--------+-------------+-----------------+------------+
1 row in set (0.00 sec)

You can experiment in a similar way with the country and countrylanguage tables.

Migrating to GRAKN.AI

OK, you should have a reasonable handle on the data, so now let’s migrate it into GRAKN.AI. The first thing, if you’ve not done it already, is to follow the quickstart guide to download GRAKN.AI and start the Grakn engine.

Ontology

There are limitations on the SQL format that prevent it from expressing the semantics of the data. By “semantic,” I mean that the meaning of the data cannot easily be encoded alongside the data itself. In contrast, a knowledge graph is self-descriptive, or, simply put, it provides a single place to find the data and understand what it’s all about.To have the full benefit of a knowledge graph, we must write the ontology for the dataset.

insert 
country sub entity    
  has countrycode    
  has name    
  has surfacearea    
  has indepyear    
  has population    
  has lifeexpectancy    
  has gnp    
  has gnpold    
  has localname    
  has governmentform    
  has headofstate    
  plays speaks-language     
  plays contains-city; 
city sub entity    
  has population    
  has name    
  plays in-country; 
language sub entity    
  has name    
  plays language-spoken; 
name sub resource datatype string;
countrycode sub resource datatype string;
surfacearea sub resource datatype double; 
indepyear sub resource datatype long; 
population sub resource datatype long;
lifeexpectancy sub resource datatype double;
gnp sub resource datatype double;
gnpold sub resource datatype double;
localname sub resource datatype string;
governmentform sub resource datatype string;
headofstate sub resource datatype string;
iscapital sub resource datatype boolean;
isofficial sub resource datatype boolean;
percentage sub resource datatype double; 
speaks sub relation    
  relates speaks-language    
  relates language-spoken    
  has percentage    
  has isofficial; 
 
has-city sub relation    
  relates contains-city    
  relates in-country    
  has iscapital; 
speaks-language sub role;
language-spoken sub role;
contains-city sub role;
in-country sub role;

We define two entities to represent country and city, and a relation between them. In the terminal, load the ontology as follows:

<path-to-GRAKN.AI>/bin/graql.sh -f ./ontology.gql

Here’s a visual representation of the ontology:

Migration Templates

Once we have written and loaded the ontology for the dataset, we need to use the Graql templating language to instruct the SQL migrator on how the SQL data is mapped to the above ontology. The SQL migrator applies the templates we provide to the results of an SQL query, to each row of results in turn, replacing the indicated sections in the template with the corresponding data. The column header is the key, while the content of each row at that column is the value.

To migrate the country data, the template code is as follows:

insert $country isa country    
  has name <Name>    
  has countrycode <Code>    
  if(<IndepYear> != null) do { has indepyear <IndepYear> }
  if(<Population> != null) do { has population <Population> }
  if(<LifeExpectancy> != null) do { has lifeexpectancy <LifeExpectancy> }    
  if(<GNP> != null) do { has gnp <GNP> }    
  if(<GNPold> != null) do { has gnpold <GNPold> }    
  if(<LocalName> != null) do { has localname <LocalName>  }
  if(<GovernmentForm> != null) do { has governmentform <GovernmentForm> }    
  if(<HeadOfState> != null) do { has headofstate <HeadOfState> };

The language migration template:

insert $language isa language has name <language>;

Then, to insert a relation between the language and the countries in which is is spoken, there is a match-insert query that matches a language and country, then builds a relationship between them.

match    
  $language isa language has name <Language>;    
  $country isa country has countrycode <CountryCode>;
insert    
 $relation (speaks-language: $country, language-spoken: $language) isa speaks        
  has isofficial if(<IsOfficial> = "F") do { false } else { true }
  has percentage <Percentage>;t6

For city migration, the template is as follows:

match    
  $country isa country has countrycode <CountryCode>;    
insert 
  $city isa city    
  has name <Name>    
  has population <Population>; 
  (contains-city: $country, in-country: $city) isa has-city;

To determine if it is the capital city:

if(<capital> != null) do {
match 
  $country isa country has countrycode <code>;    
  $city isa city has name <capital>;    
  $rel (in-country: $country, contains-city: $city) isa has-city;
insert    
  $rel has iscapital true;}

Migration Tools

There are two ways in which you can migrate SQL data into GRAKN.AI. You can use Java to perform the migration in a few lines of code, which is described further in our SQL migration example.

Alternatively, there is a shell script that you can call to apply the templates above to the SQL data. In effect, the shell script calls a set of Java functions so you don’t have to. I like this option, as Java is not my natural habitat. The migration documentation shows the script options, which are as follows:

usage: migration.sh sql -template <arg> -driver <arg> -user <arg> -pass <arg> -location <arg> -keyspace <arg> [-help] [-no] [-batch <arg>] [-uri <arg>] [-retry <arg>] [-v] 

 -a,--active <arg>     Number of tasks (batches) running on the server at any one time. Default 25.
 -b,--batch <arg>      Number of rows to execute in one                                                     transaction. Default 25.
 -c,--config <arg>     Configuration file.
 -driver <arg>         JDBC driver
 -h,--help             Print usage message.
 -k,--keyspace <arg>   Grakn graph. Required.
 -location <arg>       JDBC url (location of DB)
 -n,--no               Write to standard out.
 -pass <arg>           JDBC password
 -q,--query <arg>      SQL Query
 -r,--retry <arg>      Retry sending tasks if engine is not available
 -t,--template <arg>   Graql template to apply to the data.
 -u,--uri <arg>        Location of Grakn Engine.
 -user <arg>           JDBC username
 -v,--verbose          Print counts of migrated data.

Whether you use the shell script or Java code, what you are doing, in either case, is extracting SQL data using the JDBC API and importing it into a graph.

For this example, running from within the examples/example-sql-migration/shell-migration directory of the GRAKN.AI installation, I called Grakn’s migration.sh script on each of the templates shown above, passing in the appropriate query. For example, for the countries migration:

<path-to-GRAKN.AI>/bin/migration.sh sql -q "SELECT * FROM country;" -location jdbc:mysql://localhost:3306/world -user root -pass root -t ./migration/countries/template.gql -k grakn

However, to make it simpler, you can chain them all together in a batch file, which I’ve done — you can find it on GitHub — and simply call that:

./loader.sh <path-to-GRAKN.AI>

Querying the Graph

We can now make queries on the graph as follows. Let’s reproduce some of the queries we made previously in SQL.

SQL
SELECT * from city LIMIT 10;
Graql
match $x isa city; offset 0; limit 10;

If successful: you should see a list of 10 countries following the query.

Similarly, query for information about the city of Sydney:

SQL
SELECT * FROM city WHERE Name = ‘Sydney’;
Graql
match $x isa city, has name "Sydney";

Problems?

The Grakn team is super helpful in sorting out when things went wrong and, if you have any problems, please get in touch for help too. Just make contact via our Community page, or leave a comment below or on our Slack channel.

One issue that I hit on initially is that you need to make sure that you download the JDBC driver from here and place the JAR file (mysql-connector-java-5.1.40-bin.jar) in the /lib directory of the Grakn environment that you downloaded and set up.

Visualizing the Data

The Grakn visualizer is a cool way to look at the resulting graph and explore the data. With the Grakn engine running and the graph loaded, in your browser, navigate to http://localhost:4567/ which will allow you to make queries on the graph. It’s a nice way of seeing how data is connected, which fits much better — in my view — with how I think about cities and countries, than a table with rows and columns. Let’s explore.

We can show 10 countries and their cities:

match $x isa has-city; offset 0; limit 10;

In the GRAKN.AI visualiser:


Further guidance on using the visualiser, which is rapidly evolving, can be found in the GRAKN.AI documentation.

Why?

You may be wondering why I’ve bothered moving the data from a relational database into a Grakn graph. After all, isn’t it fine as it is? Well yes, and no. Although relational databases have benefits that include simplicity and familiarity, they also have limitations. If you are just doing basic read/writes on straightforward data, SQL may well be adequate for your needs. But, remember that I chose a simple, familiar example specifically to make this article easy to follow. If you have a more complex domain with highly interconnected data, which is very probable in today’s information landscape, you will quickly see significant benefits, since describing the relationships within data is the primary characteristic of a graph database. In this aspect, a relational database cannot begin to provide the equivalent speed or flexibility as a graph.

As a knowledge base, GRAKN.AI has an additional benefit over standard graph databases, since it allows complex data modeling, verification, scaling, querying and analysis. A key step is the definition of an ontology, which facilitates the modeling of complex datasets and guarantees information consistency. Inference rules allow the extraction of implicit information from explicit data, to achieve logical reasoning over the represented knowledge.

Conclusion

At the beginning of this article, I introduced GRAKN.AI as a graph database with extra punch. I have hardly scratched the surface of what it can do, but I hope I have at least shown that it is easy to set up a Grakn graph with familiar data, and how to query and visualize it.

This article was originally posted on the GRAKN.AI blog.

Compliant Database DevOps and the role of DevSecOps DevOps is becoming the new normal in application development, and DevSecOps is now entering the picture. By balancing the desire to release code faster with the need for the same code to be secure, it addresses increasing demands for data privacy. But what about the database? How can databases be included in both DevOps and DevSecOps? What additional measures should be considered to achieve truly compliant database DevOps? This whitepaper provides a valuable insight. Get the whitepaper

Topics:
tutorial ,data analytics ,grakn.ai ,database ,sql

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}