Comparing Grakn to Semantic Web Technologies — Part 2/3
Comparing Grakn to Semantic Web Technologies — Part 2/3
In part 2, take a look at SPARQL and RDFS and explore inserting and querying data with SPARQL, look at the RDF schema, and more.
Join the DZone community and get the full member experience.Join For Free
This is part two of Comparing Semantic Web Technologies to Grakn. In the first part, we looked at how RDF compares to Grakn. In this part, we look specifically at SPARQL and RDFS.
What Is SPARQL?
SPARQL is a W3C-standardised language to query for information from databases that can be mapped to RDF. Similar to SQL, SPARQL allows to insert and query for data. Unlike SQL, queries aren’t constrained to just one database and can be federated across multiple HTTP endpoints.
As Grakn’s query language, Graql is the equivalent query language. As in SPARQL, Graql allows to insert and query for data. However, given that Graql is not built as an open Web language, it doesn’t allow querying across multiple endpoints natively (this can be done with one of Grakn’s client drivers). As such, Graql is more similar to SQL and other traditional database management systems.
Inserting Data With SPARQL
To add data into the default graph store, this snippet describes how two RDF triples are inserted with SPARQL:
In Graql, we begin with the
insert statement to declare that data is to be inserted. The variable
$b is assigned to the entity type
book, which has a
title with value "A new book" and a
Querying With SPARQL
In SPARQL, we first declare the endpoints we want to retrieve our data from and we may attach those to a certain PREFIX. The actual query starts with
SELECT before stating the data we want to be returned. Then, in the
WHERE clause, we state the graph pattern for which SPARQL will then find the data that matches. In this query, we look for all the persons that "Adam Smith" knows using the namespaces
In Graql, we begin with the
match statement to declare that we want to retrieve data. We match for an entity of type
person who has a
family-name "Smith" and a
given-name "Adam". Then, we connect it through a
knows relation type to
$p2. As we want to know who "Adam Smith" knows, we want to be returned
$p2 which is declared in the
Let’s look at a different query: Give me the director and movies that James Dean played in, where also a woman played a role, and that woman played in a movie directed by John Ford. Below is the SPARQL code and the visual representation of this traversal type query.
In Grakn, we can ask the same like this:
Here, we assign the entity type
man with attribute
name and value "James Dean" to the variable
$p. We then say that
$w is of entity type
woman. These two are connected with
movie in a three-way relation called
woman also plays a role in another
casting relation, where the
movie entity is connected to "John Ford" who relates to this
movie through a
In the example above, the hyper-relation
casting in Grakn is representing the two
playedIn properties in SPARQL. However, in SPARQL we can only have two edges connecting
woman and "James Dean" with
movie, but not between themselves. This shows how fundamentally different modelling in Grakn is to RDF given its ability to model hypergraphs. Grakn enables to natively represent N number of role players in one relation without having to reify the model.
Schematically, this is how the query above is represented visually (note the ternary relation
In SPARQL, we can also specify in our query that certain data isn’t there using the keyword
NOT EXISTS. This finds a graph pattern that only matches if that subgraph doesn't match. In the example below, we look for actors who played in the movie Giant, but aren't yet passed away:
Using closed world assumptions, Grakn supports negation. This is done using the keyword
not followed by the pattern to be negated. The example above is represented like this:
Here, we’re looking for an entity type
movie with name "Giant", which is connected to
actor, through a relation of type
played-in. In the
not sub-query, we specify that
$a must not have an attribute of type
death-date with any value. We then
get the actor
As RDF is just a data exchange model, on its own it’s “schemaless”. That’s why RDF Schema (RDFS) was introduced to extend RDF with basic ontological semantics. These allow, for example, for simple type hierarchies over RDF data. In Grakn, Graql is used as its schema language.
RDFS extends the RDF vocabulary and allows to describe taxonomies of classes and properties. An RDFS class declares an RDFS resource as a class for other resources. We can abbreviate this using
rdfs:Class. Using XML, creating a class
animal with a sub-class
horse would look like this:
To do the same in Grakn, we would write this:
RDFS also allows for sub-typing of
Which in Grakn would look like this:
As the examples show, RDFS mainly describes constructs for types of objects (
Classes), inheriting from one another (
subClasses), properties that describe objects (
Properties), and inheriting from one another (
subProperty) as well. This sub-typing behaviour can be obtained with Graql's
sub keyword, which can be used to create type hierarchies of any
attributes) in Grakn.
However, to create a one-to-one mapping between a
class to an
entity in Grakn or a
property to a
relation in Grakn, despite their seeming similarities, should not always be made. This is because the model in RDF is built using a lower level data model, working in triples, while Grakn enables to model at a higher level.
One important modelling difference between Grakn and the Semantic Web is with regards to multiple inheritance. In RDFS, a class can have as many superclasses as are named or logically inferred. Let’s take this example:
This models an
employer as both of class
government. However, although this may look correct, the problem is that often multiple inheritance, as a modelling concept, is not used in the right way. Multiple inheritance should group things, and not subclass "types", where each type is a definition of something else. In other words, we don't want to represent instances of data. This is a common mistake.
Instead of multiple inheritance, Grakn supports single type inheritance, where we we should assign
roles instead of multiple classes. A
role defines the behaviour and aspect of a
thing in the context of a
relation, and we can assign multiple
roles to a
thing (note that roles are inherited when types subclass another).
For example, a
government can employ a
person, and a
company can employ a
person. One might suggest to then create a class that inherits both
company which can employ a
person, and end up with an
employer class that subclasses both (as shown in the example above).
However, this is an abuse of inheritance. In this case, we should create a role
employer, which relates to an
employment relation and contextualises how a
government is involved in that relation (by playing the role of
rdfs:domain and rdfs:range
Two commonly used instances of
range. These are used to state that respectively the members or the values of a property are instances of one or more classes. Below is an example of
rdfs:domain assigns the class
Person to the subject of the
This is an example of
rdfs:range assigns the class
Male to the object of the
In Grakn, there is no direct implementation of
domain. The basic inferences drawn from them would be either already natively be represented in the Grakn data model through the use of
roles, or we can create
rules to represent the logic we want to infer.
However, bear in mind that using rules in Grakn gives more expressivity in allowing us to represent the type of inferences we want to make. In short, translating
domain to Grakn should be done on a case by case basis.
In the example above,
rdfs:domain can be translated to Grakn by saying that when an entity has an attribute type
published-date, it plays the role of
published-book in a
publishing relation type. This is represented in a Grakn rule:
The example of
rdfs:range can be created with the following Grakn rule, which adds the attribute type
gender with value of "male", only if a
person plays the role
brother in any
siblingship relation, where the number of other siblings is N.
Let’s also look at another example. In a maritime setting, if we have a vessel of class
DepartingVessel, which has the Property
nextDeparture specified, we could state:
With the following
rdfs:Domain, any vessel for which
nextDeparture is specified, will be inferred to be a member of the
DepartingVessel class. In this example, this means QEII is assigned the
To do the same in Grakn, we can write a rule that finds all entities with an attribute
next-departure and assign them to a relation
departure playing the role of
Then, if this data is ingested:
Grakn infers that the vessel QEII plays the role of
departing-vessel in a
departure relation, the equivalent in this case of the
The use of
rdfs:range are useful in the context of the web, where federated data can often be found to be incomplete. As Grakn doesn't live on the web, the need for these concepts is reduced. Further, most of this inferred data is already natively represented in Grakn's conceptual model. A lot of this is due to its higher level model and the usage of rules. Therefore, directly mapping
rdfs:domain to a concept in Grakn is usually naive and leads to redundancies. Instead, translating these concepts into Grakn should be done on a case by case basis using rules and roles.
Published at DZone with permission of Tomas Sabat , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.