Who's Calling? Model and Load a Schema Into a Knowledge Graph
In this tutorial, our aim is to write a schema and load it into our knowledge graph; phone_calls.
Join the DZone community and get the full member experience.Join For Free
In this tutorial, our aim is to write a schema and load it into our knowledge graph;
phone_calls. One that describes the reality of our dataset.
First off, let’s look at the dataset we are going to be working with. Simply put, we’re going to have:
people who call each other. Those who make calls have a contract with company “Telecom.”
People, calls, contracts, and companies. That’s what we are dealing with. But what do we want to get out of this data?
The below insights will give us a better perspective of what else needs to be included in the dataset.
- Since September 14th, which customers called the person X?
- Who are the people who have received a call from a London customer aged over 50 who has previously called someone aged under 20?
- Who are the common contacts of customers X and Y?
- Who are the customers who 1) have all called each other and 2) have all called person X at least once?
- How does the average call duration among customers aged under 20 compare with those aged over 40?
This is all we need for determining how our schema should be defined. Let’s break it down.
A company has a name and can be the provider of a contract to a person, who then becomes a customer.
A person has a first and last name, an age, a city they live in, and a phone number. A person who doesn’t have a registered contract (not a customer) has only a phone number.
A call, made from a person(caller) to another person(callee), has a duration as well as the date and time it’s been made.
Now that we have a good understanding of our dataset, we can go ahead and write the schema for it.
But first, let’s visualize the reality of our dataset.
By looking at this visualized schema, we can identify the Grakn concepts.
From the Grakn Academy: Everything that describes your domain in a Grakn knowledge graph is a concept. This includes the elements of the schema (namely types and roles, which we call schema concepts) and the actual data (which we simply call things; you can think of them as instances of types if you are the programmer kind of person).
call is of type relationship that has two role players:
- person who plays the role of a caller, and
- (another) person who plays the role of a callee.
contract is also of type relationship that has two role players:
- company who plays the role of a provider, and
- person who plays the role of a customer.
company and person are of type entity.
first-name, last-name, phone-number, city, age, started-at and durationare of type attribute.
That’s all well and good, but how do we get our knowledge graph to reflect this model?
Time to Talk Graql
You can define the elements of a Grakn schema in any order you wish. I personally prefer to start from the relationships, as I see them to be the source of interactions — where knowledge is derived from.
Any relationship relates to at least one role that is played by at least 2 concepts.
In our case, a call relates to caller played by a person and to callee played by another person.
Likewise for a contract. It relates to provider played by a company and to customer played by a person.
To define the attributes, we use the has keyword.
Lastly, we need to define the type of each attribute.
Note that we don’t need to define any id attribute. Grakn takes care of that for us.
schema.gql file. In a few minutes, we’ll have it loaded into a brand new Grakn keyspace.
Load and Test the Schema
So here it is, the schema for our
phone_calls knowledge graph.
define contract sub relationship, relates provider, relates customer; call sub relationship, relates caller, relates callee, has started-at, has duration; company sub entity, plays provider, has name; person sub entity, plays customer, plays caller, plays callee, has first-name, has last-name, has phone-number, has city, has age, has is-customer; name sub attribute datatype string; started-at sub attribute datatype date; duration sub attribute datatype long; first-name sub attribute datatype string; last-name sub attribute datatype string; phone-number sub attribute datatype string; city sub attribute datatype string; age sub attribute datatype long; is-customer sub attribute datatype boolean;
Published at DZone with permission of Soroush Saffari, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.