Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

### In this post, we take a look at how to leverage GRAKN.AI's reasoning power to glean new knowledge about the world. Cool, huh?

· Big Data Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

In a previous blog tutorial, we demonstrated how to import some example SQL data into GRAKN.AI. In this article, we will work with the same data, which is about countries and cities of the world. Here, we use it to illustrate how to use inference to find information that is stored implicitly within the dataset.

This article will be useful if you are getting started with GRAKN.AI and want a simple example of how to write inference rules using Graql. If you haven’t already set up GRAKN.AI, please see the previous tutorial, or check out our setup guide.

## Introduction to Inference

Consider the following statements:

``````(If) grass is not an animal.
(If) vegetarians only eat things which are not animals.
(If) sheep only eat grass.``````

It is possible to infer the following:

``(Then) sheep are vegetarians.``

The initial statements can be seen as a set of premises. If all the premises are met, we can infer a new fact (that sheep are vegetarians). If we hypothesize that sheep are vegetarians, then the whole example can be expressed with a particular two-block structure: IF some premises are met, THEN a given hypothesis is true.

This is how reasoning in Graql works. It checks whether a set of Graql statements can be verified and, if they can, makes an inference from a second block of statements. The first set of statements (the IF part or, if you prefer, the antecedent) is called the left hand side (LHS). The second part (also know as the consequent) is, not surprisingly, the right hand side (RHS). Using Graql, both sides of the rule are enclosed in curly braces and preceded by, respectively, the keywords `lhs` and `rhs`.

## Setting Up the Example

Our example can be found on GitHub in the sample-projects repo. We aren’t going to walk through how to migrate SQL data here, since that’s a topic that was covered previously, although the scripts to perform migration directly from SQL into GRAKN.AI are available in the repo (just consult the readme file).

``bin/graql.sh -f ontology.gql``

Then load the data (this may take a few minutes):

``bin/graql.sh -f data.gql``

## What’s in the Data?

My esteemed colleague Miko already discussed inference in an earlier blog article. Things have changed a little in that the Graql syntax has moved on since he wrote it, but his article included a very nice explanation of how inference works using Italian cities, provinces, and regions to illustrate. It was such a neat example that I’ve found a practical example that is similar. Let me explain…

The SQL data I migrated into GRAKN.AI consisted of a number of tables. In this example, I’m looking at the table of data about countries and a separate table about cities. The countries table contains a number of columns with data about individual countries (such as their name, population, life expectancy, surface area, etc.). For simplicity, for each country, I migrate just the name, international country code, world region (for example, Eastern Africa, Southeast Asia) and continent it is situated in.

The city table contains a number of columns, but, in this example, I’ve imported the name of the city and the local district it resides in, which seems loosely based on the division of a country into states (for example, Texas, Iowa, etc. for the U.S.) or provinces/territories.

The city table also contains a country code to represent the country the city is located within. I’ve used that to build a relation between cities and countries using the GRAKN.AI knowledge model. The `has-city` relation is shown in the ontology, which is pretty simple, but I’ll explain it further below:

``````insert
country sub entity
has name
has countrycode
has continent
has world-region
has inf-local-district
plays contains-city;

city sub entity
has name
has local-district
has inf-continent
has inf-world-region
plays in-country;

has-city sub relation
relates contains-city
relates in-country;

contains-city sub role;
in-country sub role;

name sub resource datatype string;
countrycode sub resource datatype string;
continent sub resource datatype string;
world-region sub resource datatype string;
local-district sub resource datatype string;
inf-local-district sub resource datatype string;
inf-continent sub resource datatype string;
inf-world-region sub resource datatype string;``````

IF a city is in a country, THEN it must be located in the same continent and world region as the country.

IF a country contains a city, THEN it must contain the local district in which the city is located.

A diagram of the world example ontology.

This is why, in the ontology, a `city` entity has a resource called `inf-continent`. The reasoner uses a set of “rules” that are a Graql version of what I’ve written above to work out `inf-continent` by inspecting the related `country` and the `continent` it resides in. Likewise, the `city` entity has a resource called `inf-world-region` and the `country` entity has a resource called `inf-local-district`.

As humans, we understand the concept that a city is in a country, and a country is in a continent, and thus a city is also in the same continent as the country. We have to write rules for a computer to make the same intuitive leaps.

At the bottom of the ontology.gql file, you’ll see the coded Graql rules for reasoning over the dataset. For example, to infer the continent in which a city is located:

``````\$city-in-continent isa inference-rule
lhs
{
(contains-city: \$country1, in-country: \$city1) isa has-city;
\$country1 has continent \$continent1;
}

rhs
{
\$city1 has inf-continent \$continent1;
};``````

You can find out more about writing rules from the GRAKN.AI documentation.

Note that while we are able to make inference using city, country, and continent information (and similarly for district and region fields), it isn’t possible or sensible to apply the same model to all the information in the world dataset. For example, we cannot say that because the population of a country is one million, and a city is within that country, the population of the city is also one million, since a country is usually more than a city (Vatican City being a possible exception). We could write a rule that says that if the population of a country is `x`, we know that the population of any city within that country is less than `x`. Common sense, from a human brain, is needed before reasoning can take place!

## Making Some Queries

Let’s make some queries to get some inferred knowledge from the world database.

Firstly, let’s find out the local district, world region (inferred) and continent (inferred) for a couple of cities, Cardiff and Melbourne:

``````match \$x isa city, has name “Cardiff”, has local-district \$d, has inf-continent \$ic, has inf-world-region \$ir;

\$d val “Wales” isa local-district; \$x id “3260448” isa city; \$ir val “British Islands” isa world-region; \$ic val “Europe” isa continent;``````

Melbourne is in Victoria, which is in Australia and New Zealand (region), which is in Oceania (continent).

So, Cardiff is in district Wales (that’s in the data), but the reasoner is also telling us that it is in the British Islands (region), which is in Europe (continent).

``````match \$x isa city, has name “Melbourne”, has local-district \$d, has inf-continent \$ic, has inf-world-region \$ir;

\$d val “Victoria” isa local-district; \$x id “3842208” isa city; \$ir val “Australia and New Zealand” isa world-region; \$ic val “Oceania” isa continent;``````

Melbourne is in Victoria, which is in Australia and New Zealand (region), which is in Oceania (continent).

Now let's specify a country and find all the local districts in contains through reasoning.

``````match \$x isa country, has name “Australia”, has inf-local-district \$d;
\$x id “147552” isa country; \$d val “New South Wales” isa local-district;
\$x id “147552” isa country; \$d val “Tasmania” isa local-district;
\$x id “147552” isa country; \$d val “South Australia” isa local-district;
\$x id “147552” isa country; \$d val “Queensland” isa local-district;
\$x id “147552” isa country; \$d val “Capital Region” isa local-district;
\$x id “147552” isa country; \$d val “Victoria” isa local-district;
\$x id “147552” isa country; \$d val “West Australia” isa local-district;``````

Now let’s get a bit more creative, and ask about cities containing the word “Victoria”, and find out where in the world they are.

``````match \$x isa city, has name contains “Victoria”, has local-district \$d, has inf-continent \$ic, has inf-world-region \$ir;

\$d val “Hongkong” isa local-district; \$x id “4878472” isa city; \$ir val “Eastern Asia” isa world-region; \$ic val “Asia” isa continent;
\$d val “Mah\u00E9” isa local-district; \$x id “28074144” isa city; \$ir val “Eastern Africa” isa world-region; \$ic val “Africa” isa continent;
\$d val “Tamaulipas” isa local-district; \$x id “11374728” isa city; \$ir val “Central America” isa world-region; \$ic val “North America” isa continent;
\$d val “Las Tunas” isa local-district; \$x id “21840032” isa city; \$ir val “Caribbean” isa world-region; \$ic val “North America” isa continent;``````

Now, let’s get five cities in Oceania:

``````match \$x isa city, has inf-continent “Oceania”, has name \$n; limit 10;
\$x id “1519784” isa city; \$n val “Tafuna” isa name;
\$x id “1949832” isa city; \$n val “Canberra” isa name;
\$x id “3842208” isa city; \$n val “Melbourne” isa name;
\$x id “3920096” isa city; \$n val “Townsville” isa name;
\$x id “3838176” isa city; \$n val “Adelaide” isa name;``````

## Summary

The queries above show how information can be inferred from data, despite not being stored explicitly. It is a very trivial example, but the intention is to show the basics of reasoning and how to construct rules in Graql.

Of course, real-world models are filled with hierarchies and hyper-relationships, but this makes querying a dataset challenging, not least because traditional query languages are only able to retrieve explicitly stored data, and not implicitly derived information.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:
big data ,inference ,data analytics ,queries

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.