Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

SHACL: It's About Time

DZone's Guide to

SHACL: It's About Time

Incorporating events into your resources as abstract base classes helps you handle management of those resources and better model the space that you’re dealing with.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

Recently, I wrote a couple of articles on time series, and this got me thinking about the challenges of modeling time and the role that time plays in modeling in general. In many respects, time management has traditionally been managed via timestamps that are largely system dependent. However, once you start dealing with data across different systems, you discover quickly that this approach is inadequate. Timestamps are very seldom synchronized, temporal formats vary widely, and processes usually have both a beginning and ending time which should be taken as a single “thing.”

UnSHACLing Events

Modeling things semantically, this “thing” is typically best described as an event. An event is properly speaking an interval — the period from the start of an activity to its natural conclusion, coupled with a starting time. Beginning modelers will frequently take the approach of seeing these as separate properties, usually a startDate (or birthDate) and endDate on an object or person:

person:FelixMortenson
    a class:Person;
    event:startDate "1953-03-16"^^xsd:date;
    event:endDate "2017-01-02"^^xsd:date;
    event:type eventType:Confirmed
.

Note that all examples here are given in Turtle format.

An event can be modeled using SHACL:

@prefix class: <http://semanticalllc.com/ns/class#>.
@prefix event: <http://semanticalllc.com/ns/event#>.
@prefix person: <http://semanticalllc.com/ns/person#>.
@prefix org: <http://semanticalllc.com/ns/org#>.
@prefix job: <http://semanticalllc.com/ns/job#>.
@prefix shape: <http://semanticalllc.com/ns/shape#>.
@prefix sh: <http://www.w3.org/ns/shacl#>.
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
class:Event a owl:Class.
event:startDate 
    a owl:DataProperty.
event:endDate 
    a owl:DataProperty.
event:eventType
    a owl:ObjectProperty.
event:during
    a owl:ObjectProperty.
shape:EventShape
    a sh:NodeShape;
    sh:targetClass class:Event;
    sh:name "Event";
    sh:description "This class identifies both named and anonymous events.";
    sh:property [
        a sh:PropertyShape;
        sh:path event:startDate;
        sh:name "Start Date";
        sh:datatype xsd:date,xsd:dateTime;
        sh:minCount "0"^^xsd:integer;
        sh:maxCount "1"^^xsd:integer;
        sh:order "0";
        sh:group shape:EventGroup;
        sh:description "This indicates the date that an event begins, 
            and is treated as inclusive.";
    ];
    sh:property [
        a sh:PropertyShape;
        sh:path event:endDate;
        sh:name "End Date";
        sh:datatype xsd:date,xsd:dateTime;
        sh:minCount "0"^^xsd:integer;
        sh:maxCount "1"^^xsd:integer;
        sh:order "1";
        sh:group shape:EventGroup;
        sh:defaultValue "9999-12-31"^^xsd:date;
        sh:description """This indicates the date that an event terminates, 
           and is treated as inclusive. This can be the same date as for 
           startDate. It is optional.""";
    ];
        sh:property [
        a sh:PropertyShape;
        sh:path event:type;
        sh:name "Event Type";
        sh:class class:EventType;
        sh:minCount "0"^^xsd:integer;
        sh:order "2";
        sh:group shape:EventGroup;
        sh:description """This gives zero or more classifications 
           of the type according to the eventType.""";
    ];
        sh:property [
        a sh:PropertyShape;
        sh:path event:during;
        sh:name "During";
        sh:class class:Event;
        sh:minCount "0"^^xsd:integer;
        sh:order "3";
        sh:group shape:EventGroup;
        sh:description """This indicates that an event occurs within the 
           context of another event (or its derivative).""";
    ];

shape:EventGroup
    a sh:PropertyGroup;
    sh:order 4;
    rdfs:label "Event";
    .

As an aside, I am learning to love SHACL. I had evolved something similar for my own projects, primarily because I’ve usually found the OWL is simply too heavy-weight for applications where your primary goal is to work with business data. It also contains enough of a structure to provide UI hints. SHACL is to XSD as OWL is to DTDs — DTDs are more expressive when dealing with narrative content, but SHACL is more appropriate for doing analytics.

What this says in English is that the class Event has a shape — a description of the characteristics and properties of that class and that every instance of that class, every event, will have those properties. There are three primary properties — a required startDate, an optional endDate, and zero or more event types. The first two dates are given as either a date or a dateTime; both are supported. There are even a few hints here about display; startDate should generally appear first, then endDate, then dateTypeype. Finally, these should also appear together as a group.

There are two potential ways of binding such dates to entities. The first (as given in the example above) is to make an entity class a subclass of an event:

class:Entity
    a owl:Class;
    rdfs:subClassOf class:Event.
class:Person
    a owl:Class;
    rdfs:subClassOf class:Entity.

This approach actually works best for existential data — when did an entity come into existence (for purposes of the model) and when did it go out of existence. This does assume, however, that you have the ability to do inferencing within your triple store. The other approach is to create an entity:exists property:

entity:exists a owl:ObjectProperty.
shape:ExistsProperty
        a sh:PropertyShape;
        sh:path entity:exists;
        sh:name "Exists";
        sh:class class:Event;
        sh:targetClass class:Entity; # Could be list of descendent classes.
        sh:datatype xsd:date,xsd:dateTime;
        sh:minCount "1"^^xsd:integer;
        sh:maxCount "1"^^xsd:integer;
        sh:order "0";
        sh:description "This is an event that marks the beginning and end of existence of a class.";
.

Again, if inferencing exists, then the entity:exists property applies to all subclasses of class:Entity, such as class:Person, class:Org, etc.

Doing the Job Right

The benefit of going with the inheritance of an event class is that most entities have clearly definable existential boundaries. For instance, consider a history of jobs:

# Model
class:Person rdfs:subClassOf class:Entity.
class:Org rdfs:subClassOf class:Entity.
class:Job rdfs:subClassOf class:Entity.
entity:name a owl:DataProperty.
person:name a owl:Data Property;
    rdfs:subPropertyOf entity:name.
org:name a owl:Data Property;
    rdfs:subPropertyOf entity:name.
job:title a owl:Data Property;
    rdfs:subPropertyOf entity:name.
person:birthDate a owl:Data Property;
    rdfs:subPropertyOf event:startDate.
org:incorporated a owl:Data Property;
    rdfs:subPropertyOf event:startDate.
job:employed a owl:Data Property;
    rdfs:subPropertyOf event:startDate.
person:deathDate a owl:Data Property;
    rdfs:subPropertyOf event:endDate.
org:disincorporated a owl:Data Property;
    rdfs:subPropertyOf event:endDate.
job:released a owl:Data Property;
    rdfs:subPropertyOf event:endDate.
# Data
person:JaneDoe
    a class:Person;
    person:name "Jane Doe";
    person:birthDate "1983-05-21"^^xsd:date;
.

org:MyBigCorp
    a class:Org;
    org:name "My Big Corp";
    org:incorporated "1982-05-03"^^xsd:date;
    org:disincorporated "2010-04-05"^^xsd:date;
.
org:anotherCorp
    a class:Org;
    org:name "Another Corp";
    org:incorporated "1977-08-31"^^xsd:date;
.

job:JDJuniorDev
    a class:Job;
    job:title "Developer";
    job:employee person:JaneDoe;
    job:employer org:MyBigCorp;
    job:employed "2004-08-17"^^xsd:date;
    job:released "2007-05-12"^^xsd:date;
.
job:JDSeniorDev
    a class:Job;
    job:title "Senior Developer";
    job:employee person:JaneDoe;
    job:employer org:MyBigCorp;
    job:employed "2007-06-15"^^xsd:date;
    job:released "2010-03-08"^^xsd:date;
.
job:JDProjectManager
    a class:Job;
    job:title "Project Manager";
    job:employee person:JaneDoe;
    job:employer org:AnotherCorp;
    job:employed "2010-05-28"^^xsd:date;
.

This illustrates sub-classing a the use of inherited properties. person:birthDateorg:incorporated, and job:employed all represent existential start dates for the corresponding resources. Now, ask the question, “How old was Jane Doe when she started working for Another Corp, and what job did she hold using SPARQL?”

select ?age ?jobTitle where {
    ?person person:name "Jane Doe".
    ?org org:name "Another Corp".
    ?person person:birthDate ?birthDate.
    ?job job:employer ?org.
    ?job job:employee ?person.
    ?job job:employed ?jobStart.
    ?job job:title ?jobTitle.
    bind (xsd:gYear(?jobStart - ?birthDate) as ?age)
}

?age

?jobTitle

27 Project Manager

However, you can also simplify this query dramatically by using the base entity and event properties:

select ?age ?jobTitle where {
    ?person  entity:name     "Jane Doe".
    ?org     entity:name     "Another Corp".
    ?job     entity:name     ?jobTitle.
    ?person  event:startDate ?birthDate.
    ?job     event:startDate ?jobStart.
    ?job     job:employer    ?org.
    ?job     job:employee    ?person.
    bind (xsd:gYear(?jobStart - ?birthDate) as ?age)
}

Working With Hidden Events

One thing that becomes evident after doing enough of these is that in general when you have multiple events associated with a given resource, there is almost invariably a hidden entity floating around (or you can decompose a zero or one-to0-many relationship into subordinate entities).

For instance, consider addresses. An address is an event. It contains metadata that describe a resource (a place) along with a period of time that the address itself is around. However, there’s also another (hidden) event, called a habitation, which indicates when a person lived at this address. This usually doesn’t get modeled, but by decoupling addresses from habitations, you can simplify the data model (and queries) dramatically.

#model
address:built rdfs:subPropertyOf event:startDate.
address:tornDown rdfs:subPropertyOf event:endDate.
habitation:movedIn rdfs:subPropertyOf event:startDate.
habitation:movedOut rdfs:subPropertyOf event:startDate.
#data
person:JaneDoe
    a class:Person;
    person:name "Jane Doe";
    person:birthDate "1983-05-21"^^xsd:date;
.
address:1313MockingbirdLane
    a class:Address;
    address:built "1906-01-05"^^xsd:date;
    address:tornDown "2014-03-17"^^xsd:date;
    event:type event:Reported;
    address:street "1313 Mockingbird Lane";
    address:city "Arkham";
    address:state state:MA;
    address:postalCode postalCode:01234;
.
address:442DusalishDrLane
    a class:Address;
    address:built "1955-07-21"^^xsd:date;
    event:type event:Confirmed;
    address:street "442 Dusalish Dr";
    address:city "Seattle";
    address:state state:WA;
    address:postalCode postalCode:98765;
.
habitation:JDMockingbird
    a class:Habitation;
    habitation:address address:1313MockingbirdLane;
    habitation:occupant person:JaneDoe;
    habitation:movedIn  "2004-09-03"^^xsd:date;
    habitation:movedOut  "2010-06-05"^^xsd:date;
    event:type event:Inferred;
.
habitation:JDDusalish  
    a class:Habitation;
    habitation:address address:442DusalishDrLane;
    habitation:occupant person:JaneDoe;
    habitation:movedIn  "2010-07-18"^^xsd:date;
    event:type event:Confirmed;
.

Why is this important? For starters, people change addresses all the time, but addresses tend to overlap with a large enough population. By recognizing the distinction between a habitation and an address, you can identify when multiple people live at the same place without having to compare properties. It also makes it easier to implement forwarding addresses or do background searches, and it can reduce the amount of mistyping and redundancy. Finally, it provides a means to better implement fraud detection.

Of course, in reality, you usually do not get all of this information. This is where the event:type property comes in. This actually provides an indication about how “firm” the provenance type is for the event:

event:Approximate
    a class:EventType;
    skos:prefLabel "Approximate";
    skos:inScheme class:EventType;
    skos:definition "The event is approximate, and may not be accurate.";
.
event:Inferred
    a class:EventType;
    skos:prefLabel "Inferred";
    skos:inScheme class:EventType
    skos:definition """The event has been determined from circumstantial information.""";
.
event:Reported
    a class:EventType;
    skos:prefLabel "Reported";
    skos:inScheme class:EventType;
    skos:definition """The event has been determined from a report by an unrecognized authority.""";
.
event:Confirmed
    a class:EventType;
    skos:prefLabel "Confirmed";
    skos:inScheme class:EventType;
    skos:definition """The event has been validated by a recognized authority.""";
.

This can be used to fix events for search purposes, while at the same time giving a way of determining how trustworthy the data is. The default in most cases will be event:Reported, as this is usually information coming from an individual or organization directly. Inferred data usually comes from other circumstantial data — a person moves to a new place to take a job, so if no data is given concerning when they moved in, then it will be assumed (and can be calculated from) the job data. This is still a guess, but it's a guess with some input data.

Creating Event Hierarchies

Finally, events can be connected to other events using the event:during property. For instance, consider a presentation within a track within a conference.

#model
class:Presentation rdfs:subClassOf class:Entity.
class:Track rdfs:subClassOf class:Entity.
class:Conference rdfs:subClassOf class:Entity.
class:Room rdfs:subClassOf class:Entity.

presentation:title rdfs:subProperty entity:name.
track:name rdfs:subProperty entity:name.
conference:name rdfs:subProperty entity:name.
presentation:startTime rdfs:subPropertyOf event:startDate.
presentation:endTime rdfs:subPropertyOf event:endDate.
track:startDate rdfs:subPropertyOf event:startDate.
track:endDate rdfs:subPropertyOf event:endDate.
conference:startDate rdfs:subPropertyOf event:startDate.
conference:endDate rdfs:subPropertyOf event:endDate.
#data
presentation:ItsAboutTime
     a class:Presentation;
     presentation:title "It's About Time";
     presentation:speaker person:JaneDoe;
     presentation:room  room:BlueRoom;
     presentation:startTime "2017-03-01T12:00:00Z";
     presentation:endTime "2017-03-01T12:00:00Z";
     event:during track:DataModeling;
.
track:DataModeling
     a class:Track;
     track:name "Data Modeling";
     track:startDate "2017-03-01";
     track:endDate "2017-03-03";
     event:during conference:OpenSemantics;
.
conference:OpenSemantics
     a class:Presentation;
     conference:name "Open Semantics";
     track:startDate "2017-03-01";
     track:endDate "2017-03-03";
.
room:BlueRoom a class:Entity.

A final point — it is possible for an entity to have an associated event with no known dates (or where the dates are simply not important to track). A scheduling application would almost certainly identify room availability blocks, which would indicate when a room was accessible, in which cases, the event connection would be clearly there.

This doesn’t cover all aspects of time and events. For instance, no mention was made of bitemporal events, where you have both an instantiation (or more properly, a transaction) event and an observation event. This is actually critical for banking applications and decouples the sometimes complex intertwinings that most bitemporal systems tend to run into.

Summary

By incorporating events into your resources as abstract base classes, you can both better handle management of those resources and better model the space that you’re dealing with. This also reduces the complexity of queries by making efficient use of subclassing and subproperties and allows for staggered resources in time.

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
big data ,shacl ,time ,rdf graphs

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}