Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Get Started With GRAKN.AI

DZone's Guide to

Get Started With GRAKN.AI

GRAKN.AI is a deductive database in the form of a knowledge graph that uses machine reasoning to simplify data processing challenges for AI applications.

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

In this introductory article, I will show how to get up and running with GRAKN.AI. Before we start, though, let’s quickly answer one question...

What Is GRAKN.AI?

GRAKN.AI is a database for AI. It is a deductive database in the form of a knowledge graph that uses machine reasoning to simplify data processing challenges for AI applications. GRAKN.AI is composed of two parts: Grakn (the storage) and Graql (the language).

  • Grakn is a database in the form of a knowledge graph that uses an intuitive ontology to model extremely complex datasets. It stores data in a way that allows machines to understand the meaning of information in the complete context of their relationships. Consequently, Grakn allows computers to process complex information more intelligently with less human intervention.
  • Graql is a declarative, knowledge-oriented graph query language that uses machine reasoning for retrieving explicitly stored and implicitly derived knowledge from Grakn.

You can find out more on our website, but, for now, let’s assume you want to try it out. This article explains how to get set up, as you can discover a lot about Grakn just from trying it out.

Download GRAKN.AI

GRAKN.AI is supported on Mac OS X or Linux. We do expect to add Windows support in due course. The first thing you need to do is download and install it.

For more information on how to download older versions of GRAKN.AI, compile from source code, or import the Grakn Java API library as a development dependency, please visit our Downloads page.

Prerequisites 

GRAKN.AI requires Java 8 (Standard Edition) with the $JAVA_HOME set accordingly. If you don’t already have this installed, you can find it here. If you intend to build Grakn from its source code or develop on top of it, you will also need Maven 3.

Unzip the download into your preferred location and run the following in the terminal to start Grakn:

cd [your Grakn install directory]
./bin/grakn.sh start

This will start:

  • An instance of Cassandra, which serves as the supported backend for Grakn.
  • Grakn Engine, which is an HTTP server providing batch loading, monitoring and the browser dashboard.
  • Apache Kafka.
  • Apache Zookeeper.

Grakn Engine is configured by default to use port 4567, but this can be changed, as can settings for Kafka and Zookeeper, in the grakn-engine.properties file found within the /conf directory of the installation.

Useful Commands

To start Grakn, run grakn.sh start

To stop Grakn, run grakn.sh stop

To remove all graphs from Grakn, run grakn.sh clean

Test the Graql Shell

To test that the installation is working correctly, we will load a simple ontology and some data from a file and test it in the Graql shell and Grakn Visualiser. The example we are using is part of our genealogy-graph project. You can find out much more from our documentation in which we use the dataset extensively.

The file we will use, basic-genealogy.gql, will be included in the /examples folder of the Grakn installation zip from the release 0.11.0 onwards, and otherwise, you are welcome to clone it from the Grakn repo on GitHub. In the code below, we assume that the file is located in the /examples folder.

From the terminal, type in the following to load the example graph. This starts the Graql shell in non-interactive mode, loading the specified file and exiting after the load is complete.

./bin/graql.sh -f ./examples/basic-genealogy.gql

Then type the following to start the Graql shell in its interactive (REPL) mode, type:

./bin/graql.sh

The Graql shell starts and you see a >>> prompt. We will type in a query to check that everything is working.

match $x isa person, has identifier $n;

You should see a printout of a number of lines of text, each of which includes a name, such as “William Sanford Titus” or “Elizabeth Niesz.”

If you see the above output then congratulations! You have set up Grakn. If you are having trouble with the setup, please check our FAQ page, and if you have any questions, please ask them on our discussion forum, on Stack Overflow, or on our Slack channel.

Test the Visualizer

The Grakn visualizer provides a graphical tool to inspect and query your graph data. You can open the visualizer by navigating to localhost:4567 in your web browser. The visualizer allows you to make queries or simply browse the knowledge ontology within the graph. The screenshot below shows a basic query (match $x isa person;) typed into the form at the top of the main pane, and visualized by pressing Submit:

The help tab on the main pane shows a set of key combinations that you can use to further drill into the data. You can zoom the display in and out, and move the nodes around for better visibility. Please see our Grakn visualizer documentation for further details.

The Ontology

You can find out much more about the Grakn ontology in our documentation about the Grakn knowledge model, which states that:

The ontology is a formal specification of all the relevant concepts and their meaningful associations in a given application domain. It allows objects and relationships to be categorised into distinct types, and for generic properties about those types to be expressed.

You can think of the ontology as a schema that describes items of data and defines how they relate to one another. You need to have a basic understanding to make useful queries on the data, so let’s review the chunks of it that are important for this demonstration:

insert
# Entities
person sub entity
  plays-role parent
  plays-role child
  plays-role spouse1
  plays-role spouse2
  has-resource identifier
  has-resource firstname
  has-resource surname
  has-resource middlename
  has-resource picture
  has-resource age
  has-resource birth-date
  has-resource death-date
  has-resource gender;

# Resources
identifier sub resource datatype string;
firstname sub resource datatype string;
surname sub resource datatype string;
middlename sub resource datatype string;
picture sub resource datatype string;
age sub resource datatype long;
birth-date sub resource datatype string;
death-date sub resource datatype string;
gender sub resource datatype string;

# Roles and Relations
marriage sub relation
  has-role spouse1
  has-role spouse2
  has-resource picture;
spouse1 sub role;
spouse2 sub role;
parentship sub relation
  has-role parent
  has-role child;
parent sub role;
child sub role;

There are a number of things we can say about ontology shown above:

  • There is one entity, person, which represents a person in the family whose genealogy data we are studying.
  • The person entity has a number of resources to describe aspects of them, such as their name, age, dates of birth and death, gender, and a URL to a picture of them (if one exists). Those resources are all expressed as strings, except for the age, which is of datatype long.
  • There are two relations that a person can participate in: marriage and parentship
  • The person can play different roles in those relations, as a spouse (spouse1 or spouse2 — we aren’t assigning them by gender to be husband or wife) and as a parent or child (again, we are not assigning a gender such as a mother or father at this point).
  • The marriage relation has a resource, which is a URL to a wedding picture (if one exists).

The Data

We will not reproduce all the data here, but here is a snippet of some of the data that you added to the graph when you loaded the basic-genealogy.gql file. Each statement is adding either a person, a parentship, or a marriageto the graph:

$57472 isa person has firstname "Mary" has identifier "Mary Guthrie" has surname "Guthrie" has gender "female";
$86144 has surname "Dudley" isa person has identifier "Susan Josephine Dudley" has gender "female" has firstname "Susan" has middlename "Josephine";
...
$8304 (parent: $57472, child: $41324624) isa parentship;
$24816 (parent: $81976, child: $41096) isa parentship;
...
$122884216 (spouse2: $57472, spouse1: $41406488) isa marriage;
$40972456 (spouse2: $40964120, spouse1: $8248) isa marriage;
...

Don’t worry about the numbers such as $57472. These are variables in Graql and happen to have randomly assigned numbers to make them unique.

Querying the Graph

Let’s make some queries. You can do this in the Graql shell or in the visualiser. First, to find all the people who are married, and list their names:

match (spouse1: $x, spouse2: $y) isa marriage; 
$x has identifier $xi; $y has identifier $yi;

List parent-child relations with the names of each person:

match (parent: $p, child: $c) isa parentship; 
$p has identifier $pi; $c has identifier $ci;

Find all the people who are named Elizabeth:

match $x isa person, has identifier $y; $y value contains "Elizabeth";

Querying the graph is more fully described in the Graql documentation.

Using Inference

We will move on to discuss the use of GRAKN.AI to infer new information about a dataset. In the ontology, we have dealt only with a person, not a man or woman, and the parentship relations were simply between parentand child roles. We did not directly add information about the nature of the parent and child in each relation — they could be father and son, father and daughter, mother and son or mother and daughter.

However, the person entity does have a gender resource, and we can use Grakn to infer more information about each relationship by using that property. The ontology accommodates the more specific roles of mother, father, daughter and son:

person 
  plays-role son
  plays-role daughter
  plays-role mother
  plays-role father

parentship sub relation
  has-role mother
  has-role father
  has-role son
  has-role daughter;
mother sub parent;
father sub parent;
son sub child;
daughter sub child;

Note: You don’t need to reload the basic-genealogy.gql file into Grakn pick up these extra roles. We simply didn’t show this part in our earlier discussion of the ontology, to keep things as simple as possible. Also included in basic-genealogy.gql is a set of Graql rules to instruct Grakn’s reasoner on how to label each parentship relation:

$genderizeParentships1 isa inference-rule
lhs
{(parent: $p, child: $c) isa parentship;
$p has gender "male";
$c has gender "male";
}
rhs
{(father: $p, son: $c) isa parentship;};

$genderizeParentships2 isa inference-rule
lhs
{(parent: $p, child: $c) isa parentship;
$p has gender "male";
$c has gender "female";
}
rhs
{(father: $p, daughter: $c) isa parentship;};

$genderizeParentships3 isa inference-rule
lhs
{(parent: $p, child: $c) isa parentship;
$p has gender "female";
$c has gender "male";
}
rhs
{(mother: $p, son: $c) isa parentship;};

$genderizeParentships4 isa inference-rule
lhs
{(parent: $p, child: $c) isa parentship;
$p has gender "female";
$c has gender "female";
}
rhs
{(mother: $p, daughter: $c) isa parentship;};

If you’re unfamiliar with the syntax of rules, don’t worry too much about it just now. It is sufficient to know that for each parentship relation, Graql checks whether the pattern in the first block (left-hand side or lhs) can be verified and, if it can, infers the statement in the second block (right-hand side or rhs) to be true, so inserts a relation between gendered parents and children.

Let’s test it out!

First, try making a match query to find parentship relations between fathers and sons in the Graql shell:

match (father: $p, son: $c) isa parentship; 
$p has identifier $n1; $c has identifier $n2;

Did you get any results? Probably not, because reasoning is not enabled by default at present, although as Grakn develops, we expect that to change. If you didn’t see any results, you need to exit the Graql shell and restart it, passing -n and -m flags to switch on reasoning (see our documentation for more information about flags supported by the Graql shell).

./bin/graql.sh -n -m

Try the query again:

match (father: $p, son: $c) isa parentship; 
$p has identifier $n1; $c has identifier $n2;

There may be a pause, and then you should see a stream of results as Grakn infers the parentships between male parent and child entities. It is, in effect, building new information about the family which was not explicit in the dataset.

You may want to take a look at the results of this query in the Grakn visualizer and, as for the shell, you will need to activate inference before you see any results. Browse to the visualiser at localhost:4567 and open the Config settings on the left-hand side of the screen. When the page opens you will see the Activate Inference checkbox. Check it and try submitting the query above or a variation of it for mothers and sons, fathers and daughters, etc. Or, you can even go one step further and find out fathers who have the same name as their sons:

match (father: $p, son: $c) isa parentship; 
$p has firstname $n; $c has firstname $n;

If you want to find out more about the Graql reasoner, we have a detailed example that uses a slightly different version of the genealogy dataset. An additional discussion on the same topic can be found in our “Family Matters” blog post.

Data Migration

In this example, we loaded data from basic-genealogy.gql directly into a graph. However, data isn’t often conveniently stored in GQL files and, indeed, this data was originally in CSV format. Our CSV migration example explains in detail the steps we took to migrate the CSV data into Grakn.

Migrating data in formats such as CSV, SQL, OWL, and JSON into Grakn is a key use case. More information about each of these can be found in our migration documentation.

Where Next?

This post was a very high-level overview of some of the key use cases for Grakn, and has hardly touched the surface or gone into detail. The rest of our developer documentation and examples are more in-depth and should answer any questions that you may have, but if you need extra information, please get in touch.

A good place to start is to explore our additional example code and the documentation for:

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:
grakn ,artificial intelligence ,ai database ,big data ,graphs ,tutorial

Published at DZone with permission of Jo Stichbury, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}