{{announcement.body}}
{{announcement.title}}

Comparing Grakn to Semantic Web Technologies — Part 1/3

DZone 's Guide to

Comparing Grakn to Semantic Web Technologies — Part 1/3

This article explores how Grakn compares to Semantic Web Standards, focusing specifically on RDF, XML, RDFS, OWL, SPARQL and SHACL.

· Database Zone ·
Free Resource

This article explores how Grakn compares to Semantic Web Standards, focusing specifically on RDF, XML, RDFS, OWL, SPARQL and SHACL. There are some key similarities between these two sets of technologies - primarily as they are both rooted in the field of symbolic AI, knowledge representation and automated reasoning. These similarities include:

  1. Both allow developers to represent and query complex and heterogeneous data sets
  2. Both give the ability to add semantics to complex sets of data
  3. Both enable the user to perform automated deductive reasoning over large bodies of data

However, there are core differences between these technologies, as they were designed for different types of applications. Specifically, the Semantic Web is built for the Web, with incomplete data coming from many sources, where anyone can contribute to the definition and mapping between information sources. Grakn, in contrast, wasn't built to share data over the web, but instead to work as a transactional database for closed world organizations. Because of this, comparing the two technologies sometimes feels like comparing apples to oranges.

These differences can be summarised:

  1. Compared to the Semantic Web, Grakn reduces the complexity while maintaining a high degree of expressivity. With Grakn, we avoid having to learn different Semantic Web Standards, each with high levels of complexity. This reduces the barrier to entry.
  2. Grakn provides a higher-level abstraction for working with complex data than Semantic Web Standards. With RDF we model the world in triples, which is a lower level data model than Grakn's entity-relation concept level schema. Modelling and querying for higher order relationships and complex data is native in Grakn.
  3. Semantic Web Standards are built for the Web, Grakn works for closed world systems. The former was designed to work for linked data on an open web with incomplete data, while Grakn works like a traditional database management system in a closed world environment.

This documentation shows there are strong overlaps in how both technologies offer tools for knowledge representation and automated reasoning and covers the most important concepts at a high level without going into too much detail. The goal is to help users from an RDF/OWL background to familiarise themselves with Grakn.

The Semantic Web Stack

The Semantic Web started at the end of the 1990s to extend the web's existing architecture with a layer of formal semantics. It consists of a number of Standards that together make the Semantic Web Stack. The technologies covered in this article include: XML, RDF, RDFS, OWL, SPARQL, and SHACL.

  • RDF is a standard to exchange data over the web and XML as its serialization.
  • RDFS provides a schema and some basic ontological constructs.
  • OWL enhances this further with constructs from Descriptive Logic. SPARQL is the language to query and insert RDF data.
  • SHACL provides a set of verification constraints to logically validate data.

In addition to the standards, there are different libraries and implementations a user has to choose from to actually use the standards in practice. For example, several libraries exist that allow the user to use RDF or SHACL (for Java alone you can choose between these two: TopBraid and Apache Jena), but they all vary slightly from the standards and have individual nuances.

However, despite a lot of educational material available, the Semantic Web has not achieved mass adoption outside academia. Because of the large number of technologies a user needs to learn, coupled with their inherent complexity, a user spends a long time educating themselves on the Semantic Web before getting started. The barrier to entry is high. This makes it hard for most developers to get started.

Instead, Grakn provides the user with just one technology that can replace many of the standards in the Semantic Web (in this article we cover RDF, RDFS, OWL, SPARQL and SHACL). This means, for example, a user building an application doesn't need to educate themselves on what type of reasoner, which verification system or what query language to use. With Graql, all of this happens within the same technology the user only needs to learn once.

Grakn works at a higher level and is easier to learn, reducing the barrier to entry, enabling millions of developers to have access to semantic technologies that previously were inaccessible. In Grakn, ease of use is a first principle.

In short, Grakn is a distributed logical database in the form of a knowledge graph that implements a concept-level schema. This knowledge representation system is then interpreted by an automated reasoning engine that performs automated deductive reasoning during query runtime. The querying, schema and reasoning all happen through Grakn's query language - Graql.

The formal foundations of Grakn's concept level schema are provided by the hypergraph, which plays the same role as the relational model for relational databases, directed graphs in the Semantic Web and property graphs for graph databases. Hypergraphs generalise the common notion of what an edge is. In RDF and property graphs, an edge is just a pair of vertices. Instead, a hypergraph is a set of vertices, which can be further structured. The benefits over a directed graph model include:

  1. The natural mechanism of grouping relevant pieces of information in a relational style, which is to a large extent lost in directed graphs.
  2. Uniform handling of all n-ary relationships, as opposed to directed graphs where relations with more than two role players require a radical change in the modelling approach (n-ary).
  3. A natural way of expressing higher-order information (relations between relations, nesting of information), which in directed graphs require dedicated modelling techniques (i.e. reification).

RDF

RDF Triples

RDF is a system to model data and distribute it over the web. It's made out of a labelled, directed multigraph with vertices and labelled edges. These consists out of URIs (things), literals (data values) and blank nodes (dummy nodes).

RDF stores triples in subject-predicate-object form, atomic statements declared one by one. Below is an example of a person "Peter Parker", who knows another person "Aunt May" in XML notation.

Java
 




xxxxxxxxxx
1
12


 
1
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
2
         xmlns:foaf="http://xmlns.com/foaf/0.1/"
3
         xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
4
 <foaf:Person>
5
   <foaf:name>Peter Parker</foaf:name>  
6
   <foaf:knows>
7
       <foaf:Person>
8
         <foaf:name>Aunt May</foaf:name>
9
       </foaf:Person>   
10
   </foaf:knows>
11
 </foaf:Person>
12
</rdf:RDF>



As we can see, RDF gives up in compactness what it gains in flexibility. This means that it can be expressive, while being extremely granular. Every single relationship between a data point is explicitly declared - a relationship either exists or not. This makes it easy to merge data from different sources compared to traditional relational databases. The above triples make up this graph:

Grakn doesn't work with triples. Instead, it exposes a concept level entity-relationship model. So, instead of modelling in subject-predicate-object form, Grakn represents our data at a higher level, with entities, relations, roles and attributes. For the example above, we would say there are two person entities, which have name attributes and are related through a knows relation.

Java
 




xxxxxxxxxx
1


 
1
$p isa person, has name "Peter Parker";
2
$p2 isa person, has name "Aunt May"; 
3
($p, $p2) isa knows;



Grakn model with two entities ("Peter Parker" and "Aunt May") and one relation ("knows").

Hyper-Relations

As mentioned, Grakn's data model is based on hyper-graphs. While in RDF an edge is just a pair of vertices, a hyperedge is a set of vertices. Grakn's data model's formal foundation is based on three premises:

  1. A hypergraph consists of a non-empty set of vertices and a set of hyperedges
  2. A hyperedge is a finite set of vertices (distinguishable by specific roles they play in that hyperedge)
  3. A hyperedge is also a vertex itself and can be connected by other hyperedges

Note: although Grakn leverages hyperedges, Grakn doesn't actually expose edges or hyperedges. Instead, it works with relations, or hyper-relations. Below is a figure that depicts how this works. The example shows two hyper-relations:

  • marriage, describing a binary marriage relation between Bob and Alice playing the roles of husband and wife, respectively
  • divorce-filing describing a ternary divorce-filing relation involving three role-players in the roles of certified marriage, petitioner and respondent

A hyper-relation can simply be seen as a collection of a role & role-player pairs of arbitrary cardinality. As hyper-relations cannot be represented natively in a labelled directed graph, the above example in RDF can end up looking like this:

As is shown, each hyper-relation in Grakn can be mapped to the corresponding directed graph in the RDF model. For instance, in this model, entity and relation types are also explicitly encoded as RDF resources in the RDF style. As such, hyper-relations can be implemented over an RDF triple store. Therefore, in terms of modelling, hyper-relations offer a very natural and straightforward data representation formalism, enabling modeling at a conceptual level using entity-relationship diagrams.

The difference, however, is that in Grakn hyper-relations become first-class modelling constructs. This is important because in a real-life scenario, when the complete conceptual model is not fully foreseen at the outset, the actual modelling outcome may create a lot of unnecessary complexity. Furthermore, modelling hyper-relations natively, as compared to binary directed edges, leads to improvements to query planning and query optimisation, as the data grouped together in the same structure "containers" is also often retrieved in similar groupings by users and applications. And by acknowledging the structure of these in advance of querying, the retrieval process can be more optimally planned and executed.

Namespaces

Due to the public nature of the Web, RDF uses namespaces to interpret and identify different ontologies, which are usually given at the beginning of a document to make it more readable. Every resource in RDF has a unique identifier. The rdf:RDF tag tells it that it's an RDF document:

As Grakn doesn't operate on the web, there isn't a need for URIs. A related concept is that of keyspaces, which are logically separate databases within a Grakn instance. Unlike an RDF namespace these cannot talk to each other.

Serialization There are many ways to express RDF in textual form. One common way is to represent triples in XML format (as recommended by W3C):
Java
 




xxxxxxxxxx
1


 
1
rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#



As Grakn doesn’t operate on the web, there isn’t a need for URIs. A related concept is that of keyspaces, which are logically separate databases within a Grakn instance. Unlike an RDF namespace these cannot talk to each other.

Serialization

There are many ways to express RDF in textual form. One common way is to represent triples in XML format (as recommended by W3C):


Java
 




xxxxxxxxxx
1


 
1
<http://example.org/#spiderman>
2
  <http://www.perceive.net/schemas/relationship/enemyOf>
3
    <http://example.org/#green-goblin> .
4
<http://example.org/#green-goblin>
5
  <http://www.perceive.net/schemas/relationship/enemyOf>
6
    <http://example.org/#spiderman> .



However, as XML can become difficult to read, can also be used as a more compact serialisation (other popular serialisation formats include JSON-LD and N-triples). With Turtle, we use qnames instead of local URIs. The example below represents two Persons from the foaf namespace: "Green Goblin" and "Spiderman". They are connected through a relationship called enemyOf from the relnamespace.
Java
 




xxxxxxxxxx
1
13


1
@base <http://example.org/> .
2
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
3
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
4
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
5
@prefix rel: <http://www.perceive.net/schemas/relationship/> .
6
<#green-goblin>
7
    rel:enemyOf <#spiderman> ;
8
    a foaf:Person ;
9
    foaf:name "Green Goblin" .
10
<#spiderman>
11
    rel:enemyOf <#green-goblin> ;
12
    a foaf:Person ;
13
    foaf:name "Spiderman" .



In Grakn, we avoid the need to choose between multiple serialisations and use Graql. The example can be represented like this:

Java
 




xxxxxxxxxx
1


 
1
$g isa person, has name "Green Goblin"; 
2
$s isa person, has name "Spiderman"; 
3
(enemy: $g, enemy: $s) isa enemyship;



Here, we have two person entities, with attribute name "Green Goblin" and "Spiderman". They are connected through a relation of type enemyship, where both play the role of enemy.

Higher Order Relationships

Given the subject/predicate/object form of a triple, modelling in RDF can become limited when representing higher order relationship. For example, let's take this triple:

Java
 




xxxxxxxxxx
1


1
lit:HarryPotter bio:author lit:JKRowling .



This states that JK Rowling wrote Harry Potter. However, we may want to qualify this statement by saying that JK Rowling wrote Harry Potter in 2000. In RDF, to do this we would go through a process called reification, creating one triple per statement, where the subject of the triple would be the same node:

Java
 




xxxxxxxxxx
1


1
bio:n1 bio:author lit:JKRowling .
2
bio:n1 bio:title "Harry Potter" .
3
bio:n1 bio:publicationDate 2000 .



In Grakn, given its concept level schema, the need for reification doesn't exist and we can represent higher order relationships natively. JK Rowling wrote Harry Potter would be expressed like this:

Java
 




xxxxxxxxxx
1


1
$a isa person, has name "JK Rowling"; 
2
$b isa book, has name "Harry Potter"; 
3
(author: $a, publication: $b) isa authorship;



Then, if we want to qualify this and say JK Rowling wrote Harry Potter in 2000, we would simply add an attribute to the relation:

Java
 




xxxxxxxxxx
1


1
$a isa person, has name "JK Rowling"; 
2
$b isa book, has name "Harry Potter"; 
3
(author: $a, publication: $b) isa authorship, has date 2000;


Blank Nodes

Sometimes in RDF we don't want to give a URI or a literal. In such cases we are dealing with blank nodes, which are anonymous resources without a Web identify. An example is the statement that Harry Potter was inspired by a man who lives in England:

Java
 




xxxxxxxxxx
1


1
lit: HarryPotter bio:name lit:"Harry Potter" .
2
lit:HarryPotter lit:hasInspiration [a :Man; 
3
            bio:livesIn geo:England] .



As Grakn doesn't live on the web, the idea of a blank node doesn't directly translate to Grakn. While in RDF we use a blank node to indicate the existing of a thing for which we don't have a URI, there are multiple ways how this could be done in Grakn. If, as in the example above, we're using a blank node to indicate that we don't know anything else about that man, other than that he lives in England, we represent this as follows:

Java
 




xxxxxxxxxx
1


 
1
$b isa book, has name "Harry Potter"; 
2
$m isa man; ($b, $m) isa inspiration; 
3
$l isa location, has name "England";
4
($m, $l) isa lives-in;



What we can see is that the variable $m is assigned to the entity type man, for which no further information is given, other than that he is connected to the entity type location with name "England", through a lives-in relation type.

In Part 2, we look at how Grakn compares SPARQL and RDFS. To learn more, make sure to attend our upcoming webinars via here .
Topics:
artificial intelligence, grakn, semantic web, semantic web technologies

Published at DZone with permission of Tomas Sabat , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}