{{announcement.body}}
{{announcement.title}}

Querying The DBpedia Open Knowledge Graph With Standard SQL

DZone 's Guide to

Querying The DBpedia Open Knowledge Graph With Standard SQL

DBpedia is a crowd-sourced community project that extracts structured content from mainly Wikipedia pages — learn how to use it!

· Database Zone ·
Free Resource

Introduction to DBpedia

DBpedia is a crowd-sourced community project that extracts structured content from mainly partially unstructured and semi-structured parts of Wikipedia pages and other structured sources such a Wikimedia. The project maintains links to external sources and reciprocally many sources point to the latest DBpedia ontology/open knowledge graph (OKG).

You may also like: The Best Way to Write a SQL Query

The extracted/consolidated information from a holistic viewpoint represents an ontology/open knowledge graph (OKG), which in turn, also represents a knowledge-base containing at the same time class/property metadata and class/property instance data. DBpedia covers a multitude of domains as compared to typical limited domain-specific knowledge-bases and also, it includes a multitude of different international chapters/language communities.

On the other hand, timbr DBpedia represents a synergy between DBpedia + SQL. Permits the querying of the DBpedia ontology/Open Knowledge Graph (OKG) via standard SQL including also path expressions, avoiding the definition of complex joins and complex SPARQL queries, via a SPARQL End-point. 

Anybody that feels comfortable with SQL can reuse its knowledge for querying/accessing the DBpedia OKG. Also, timbr supports creating ontologies in SQL and also supports loading existing ones like in this case, the DBpedia Ontology/OKG, which is written in OWL.

Timbr DBpedia Semantic Web Stack

timbr DBpedia supports RDF/RDFS/OWL ontologies via its virtual SQL compiler which fulfills the Semantic Web stack:

  • Modeling, managing, querying ontologies via standard SQL
  • Enabling graph traversals in standard SQL via path expressions as typically included in graph databases
  • Permitting semantic inference at run-time via ontology rules
  • Avoiding complex joins and complex SPARQL queries
  • Supporting RDF/RDFS/OWL ontologies mapping them to backend SQL DBMSs.
  • Extending SQL with inheritance (IS-A relationships allowing the definition of large subsumption hierarchies) and inference capabilities.

Figure 1. timbr DBpedia Semantic Web Stack

Timbr DBpedia Architecture

SQL clients and BI tools access via a Virtual SQL engine/query service (timbr supports JDBC/ODBC), the timbr DBpedia ontology that maps ontology artifacts to backend SQL DBMS’s. OKG Classes/Concepts/Properties are mapped to relational tables/columns via its “three-in-one”, timbr, etimbr, and dtimbr schemas (see section 5).

Properties may include direct properties, inherited properties (within the subsumption hierarchy) and references that are special attributes representing relationships to other classes/concepts (typically represented as URI’s/foreign keys in the virtual SQL engine).

The DBpedia OKG no longer is required to be accessed by complex SPARQL queries/SPARQL end-points. The DBpedia OKG can be queried with standard SQL. timbr DBpedia also includes an Ontology Viewer/Explorer so end-users can intuitively review the scope of a given sub-part of the OKG. 

In this technical report, we focus on accessing the DBpedia OKG via SQL clients (in this case MySQL and Amazon Redshift). At the end of this report, we briefly mention the timbr DBpedia Ontology/OKG Explorer and the timbr DBpedia Data Exploration Beta component.

Figure 2. Timbr DBpedia Architecture

Timbr DBpedia SYSTEM TABLES

We briefly mention timbr-DBpedia system tables as the set of tables supporting the definition and querying of the DBpedia via standard SQL. It can be considered as the underlying "system catalog" table. Among them we can cite:

Timbr DBpedia System Tables:

  • Timbr.SYS_CONCEPTS: contains unique timbr DBpedia ontology class/concepts.
  • Timbr.SYS_INHERITANCE: contains base/derived class/concepts.
  • Timbr.SYS_ONTOLOGY: contains the set of classes/concepts with the respective properties, primary keys, inherited primary keys, parent class/concepts, inheritance level and so on.
  • Timbr.SYS_RELATIONSHIPS: includes a total of 8 columns, among them class/concept, the related class/concept, a foreign key name associated, etc.

Three-In-One: Timbr, Etimbr, Dtimbr Ontology Schemas

In timbr DBpedia there are three different viewpoints/perspectives of the ontology representation:

  • Timbr schema: Implicit representation of the ontology. Includes explicit ontology/relational mapping.
  • Etimbr schema: Exhaustive representation. Includes the timbr schema ontology artifacts plus derived class/concept properties. In other words, parent class/concepts include derived properties from derived classes/concepts within the subsumption hierarchy (for instance the “thing” class/concept will include all the DBpedia ontology/knowledge graph properties from where all class/concepts inherit entity_id and entity_type properties).
  • Dtimbr schema: Dereferenced representation that includes the etimbr schema plus graph/hop traversals, avoiding unnecessary joins.

Timbr DBpedia Schema Queries

This particular DBpedia schema representation treats classes/concepts/properties as tables/columns where every class/concept have as a minimum, the following attributes (inherited from the parent ancestor ontology artifact “thing”):

  • Entity_id: unique URI/primary key.
  • Entity_type: self-explanatory, it is the type of ontology artifact.

Please note that timbr DBpedia permits multiple inheritances, therefore, entity_type may contain several entries. Also please note that the DBpedia ontology/Knowledge Graph may contain many class/concept property null values.

SQL
 




x


 
1
SELECT entity_label as `Book Title`, publisher as `Publisher URI`, publicationdate as `Publication Date`, numberofpages `Num. Pages`, isbn `ISBN`
2
FROM timbr.book
3
WHERE entity_label IS NOT NULL AND publisher IS NOT NULL AND publicationdate IS NOT NULL AND numberofpages IS NOT NULL AND isbn IS NOT NULL 
4
AND entity_type='book'
5
ORDER BY publicationdate DESC



One can note that the first query row refers to invalid publication date. The rest seems to be valid date entries.

Etimbr DBpedia Schema Queries

Etimbr DBpedia Schema queries include timbr properties plus derived class/concept properties. Parent/ancestors' class/concepts include derived properties from derived classes/concepts within the subsumption hierarchy. Please note that the parent class/concept of all classes/concepts is "thing", therefore the "thing" class/concept will contain all the set of properties in the e-timbr schema.

Query Example:

The following query executes a UNION of “literarygenre” in “thing” class/concept and “genre” in “genre” concept:

SQL
 




xxxxxxxxxx
1
13


 
1
--  Retrieve distinct book ”genres”  First Part Retrieves 9 distinct rows
2
SELECT  distinct(thing.entity_label) as `Literary Genre`  
3
FROM etimbr.book `b`
4
JOIN (SELECT entity_id, entity_label FROM etimbr.thing) as thing
5
ON `b`.genre = thing.entity_id
6
WHERE `Literary Genre` like '%fiction%'
7
UNION -- Retrieves a total of 11 distinct rows
8
--  Retrieve distinct book ”literarygenres”  Second Part Retrieves 10 distinct rows
9
SELECT  distinct(thing.entity_label) as `Literary Genre`  
10
FROM etimbr.book `b`
11
JOIN (SELECT entity_id, entity_label FROM etimbr.thing) as thing
12
ON `b`.literarygenre = thing.entity_id
13
WHERE `Literary Genre` like '%fiction%'



Query Example:

All Ontology/OKG classes/concepts have as an ancestor “thing” concept, therefore, we can get the whole set of derived attributes via etimbr.thing schema. In this case, we obtain simple statistics returning the following:

  • The number of classes/concept instances in timbr DBpedia ontology/OKG.
  • The number of distinct class/concept types.
  • The number of distinct entity_labels (property only accessible via etimbr schema (derived property)).
SQL
 




xxxxxxxxxx
1


 
1
SELECT count(entity_id) as `Num Concepts`, count(distinct entity_type) as `Num Entity Types` , count(distinct entity_label) as `Num Distinct Entity Labels`
2
FROM etimbr.thing



query history

Query Example:

Now we obtain simple statistics by querying etimbr.thing (includes all instance types given that all classes/concepts have as ancestor concept thing):

  • Grouping by class/concept type.
  • The number of class/concept instances (person, place, company).
SQL
 




xxxxxxxxxx
1


 
1
SELECT entity_type as `Concept`, count(resource_id) as `Number of Concept Instances`
2
FROM etimbr.`thing`
3
WHERE entity_type in ('person','place','company')
4
GROUP BY entity_type
5
ORDER BY `Number of Concept Instances` DESC



query history

This query presents an interesting viewpoint that needs to be further explained. If we define the following query:

"SELECT count(1) from timbr.person"

We retrieve 4339681 “person” concept instances because the timbr inference rules include instances that are mapped to derived classes/concepts of a person as well, as compared to the query depicted above specifically filtered person instances mapped directly to the “person” concept (2829415).

Dtimbr DBpedia Schema Queries

Dtimbr DBpedia schema: includes the etimbr schema plus graph traversals. Relationships to other class/concepts are modeled as properties, therefore avoiding complex joins and enabling path query expressions (typical in graph databases).

Query:

This query retrieves the book URI/ID, book title, author name, and thing literary genre where book title is about science fiction, or literary genre is about science fiction or the author is in a selected list of science fiction authors:

SQL
 




xxxxxxxxxx
1


 
1
SELECT `b`.entity_id as `Book URI`, `b`.entity_label as `Book Title`, `b`.`author.entity_label` as `author_name`, thing.entity_label as `Literary Genre`, `b`.entity_type as `entity_type`  
2
FROM dtimbr.book `b`
3
LEFT JOIN (SELECT entity_id, entity_label FROM etimbr.thing) as thing
4
ON `b`.literarygenre = thing.entity_id
5
WHERE entity_type LIKE '%book%' AND
6
lower(`author_name`) IN ('isaac asimov','arthur c. clarke','orson scott card', 'robert a. heinlein') OR
7
lower(`Book Title`) LIKE '%science fiction%'  OR 
8
lower(`Literary Genre`) LIKE '%science fiction%' 
9
ORDER BY `Literary Genre` ASC 



Query:

Dtimbr query retrieves the number of person’s types grouped by continent and country (a total of three hops) and entity_type:

SQL
 




xxxxxxxxxx
1


 
1
SELECT `birthplace.country.continent.entity_label` as `Birth Continent`, `birthplace.country.entity_label` as `Birth Country`, entity_type as `Person Type`, count(entity_type) as `Number of Persons Type` 
2
FROM dtimbr.person
3
WHERE `Birth Country` in ('Australia','Spain','United States of America','Peru') 
4
-- OR `birthplace.country.continent.entity_label` in ('Europe')  
5
GROUP BY `Birth Continent`, `Birth Country`, entity_type
6
HAVING `Number of Persons Type` > 10
7
ORDER BY `Birth Country` ASC, `Number of Persons Type` DESC



query history

Timbr DBpedia Ontology Explorer

Timbr DBpedia models the ontology/OKG with concepts, properties and references mapped to: 

  • Concepts: ontology concepts are mapped to relational tables.
  • Properties: concept properties are mapped to columns in relational tables and may be of type typical SQL data types such as varchar, integer, timestamp, etc. depending on the underlying back-end DBMS (i.e. Amazon Redshift, Apache Spark, etc.). Properties include inherited properties (from parent/ancestor concepts), direct properties (defined for a given concept).
  • References: represent relationships between ontology concepts and are mapped to SQL foreign keys in relational tables. The SQL data type of reference is varchar representing an ontology URI.

Timbr DBpedia supports complex subsumption hierarchies via IS-A relationships (up-to 8 levels allowing multiple inheritances), properties and references to other concepts. The Ontology Explorer contains the following parts:

  • Ontology Tree: situated on the left-hand side displays the subsumption hierarchy of a given concept.
  • Ontology Graph Viewer/Explorer: on the center of the screen, displays the entire ontology sub-part/view.
  • Ontology Concept Details: displays a summary of information for the selected concept, including the direct ancestor, the hierarchical level, number of total properties, direct and inherited references, inherited and direct properties.
  • Ontology Main Graph Control: allows different options for controlling the output of the ontology explorer such as including references/relationships, properties and so on.

Figure 3. timbr DBpedia Ontology Explorer

In the ontology tree, we can see the “book” concept hierarchy, which is in level 4. In the Ontology graph Viewer/Explorer (center of the screen) we have included property/references displaying author, writer, genre, and literarygenre. On the right-hand side, in the ontology concept details, we can see that the concept book has a total of 126 properties, 6 direct references, 69 inherited references, 15 direct properties, and 111 inherited properties.

Timbr DBpedia Data Exploration Beta

We can access the new Beta Ontology/Data exploration menu item by selecting Exploration/Data Exploration beta, which firstly displays a dialog box that demands us to enter the required back-end DBMS. In this case, we can select either DBpedia Amazon Redshift or DBpedia Spark. After selecting DBpedia Spark we have to choose the desired Ontology/OKG class/concept. 

A set of concept properties is required to fill-in and the required filters and results to be retrieved. For the time being, this Beta does not allow to define a filter that allows path expressions. Nevertheless, we can do so via the Concept details Window section, step-by-step incrementally. According to timbr.ai, this feature will be incorporated later in the Beta program.


choose your ontology

Figure 4. timbr DBpedia Data Exploration Beta Query Composer

In Figure 5 we can identify the Data Exploration Beta parts:

  • Data Exploration Concept Viewer: Situated on the left-hand side of the screen, displays concepts involved in the OKG.
  • Data Exploration Ontology/OKG Viewer: Situated in the middle of the screen, we can identify several books with some relationships exploded such as book author and author details such as birthplace and death place.
  • Data Exploration Concept Details Viewer: On the right-hand side of the screen, displays concept/properties details, giving the possibility of graph traversal with the related/references to other concepts.

dpedia-spark

On the captured screen below of the Data Exploration Beta timbr DBpedia component, we have “exploded” different “book”, ”author” (person) concepts, which have been added incrementally by clicking on the “path traverse” button. A total of 11 related concepts are involved, including book, person, place, country, continent, language, agent, ethnicgroup, populatedplace, award and so on.


dpedia-spark

Figure 6. timbr DBpedia Data Exploration Beta Component: Book. Person(Author), Place, Country, Continent, Language Concept Viewer

Conclusion

With timbr-DBpedia, we can analyze/traverse the DBpedia Ontology/OKG with standard SQL that includes the set-up of path expressions avoiding cumbersome joins and avoiding complex SPARQL queries/SPARQL end-points. Anybody that has experience working with standard SQL will feel comfortable issuing timbr DBpedia queries. 

Timbr DBpedia is scalable as much as the used back-end. In this technical report, we have been working with timbr DBpedia Amazon Redshift for the OKG, for system tables we have been querying a MySQL backend and for the timbr DBpedia Data Exploration Beta we have been working with DBpedia Spark.


Further Reading

14 Differences Between Standard SQL and Transact-SQL

How to Apply Non-Standard SQL Formatting Using SQL Prompt

The Unreasonable Effectiveness of SQL

Topics:
dbpedia ,sql ,knowledge graph ,semantic web ,tutorial ,database

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}