a
graph database
and its ecosystem of technologies can yield elegant, efficient solutions to problems in
knowledge representation and reasoning
. to get a taste of this argument, we must first understand what a graph is. a
graph
is a data structure. there are numerous types of graph data structures,
but for the purpose of this post, we will focus on a type that has come
to be known as a
property graph
.
a property graph denotes vertices (nodes, dots) and edges (arcs,
lines). edges in a property graph are directed and labeled/typed (e.g.
“marko
knows
peter”). both vertices and edges (known generally
as elements) can have any number of key/value pairs associated with
them. these key/value pairs are called properties. from this
foundational structure, a suite of questions can be answered and
problems solved.
object modeling
the property graph data structure is nearly identical in form to the
object graphs of object oriented programming. take a collection of
objects, remove their methods, and you are left with a property graph.
an object’s fields are either primitive and in which cases serve as
properties or they are complex and in which case serve as references to
other objects. for example, in java:
4
|
collection<person> knows;
|
the
name
and
age
properties are vertex properties of the particular person instance and the
knows
property refer to
knows
-labeled edges to other people.
emil eifrem
of
neo technology
espouses the view that property graphs are “whiteboard friendly” as
they are aligned with the semantics of modern object oriented languages
and the diagramming techniques used by developers. a testament to this
idea is the
jo4neo
project by
taylor cowan
. with jo4neo, java annotations are elegantly used to allow for the backing of a java object graph by the
neo4j
graph database. beyond the technological benefits, the human mind tends
to think in terms of objects and their relations. thus, graphs may be
considered “human brain friendly” as well.
given an object graph, questions can be answered about the domain. in the graph traversal
dsl
known as
gremlin
, we can ask questions of the object graph:
01
|
// who does marko know?
|
02
|
marko.oute('knows').inv
|
04
|
// what are the names of the people that marko knows?
|
05
|
marko.oute('knows').inv.name
|
07
|
// what are the names and ages of the people that marko knows?
|
08
|
marko.oute('knows').inv.emit{[it.name, it.age]}
|
10
|
// who does marko know that are 30+ years old?
|
11
|
marko.oute('knows').inv{it.age > 30}
|
concept modeling
from the instances that compose a model, there may exist abstract
concepts. for example, while there may be book instances, there may also
be categories for which those books fall–e.g. science fiction,
technical, romance, etc. the graph is a flexible structure in that it
allows one to express that something is related to something else in
some way. these somethings may be real or ethereal. as such, ontological
concepts can be represented along with their instances and queried
appropriately to answer questions.
1
|
// what are the parent categories of history?
|
2
|
x = []; history.ine('subcategory').outv.aggregate(x).loop(3){!it.equals(literature)}; x
|
4
|
// how many descendant categories does fiction have?
|
5
|
c = 0; fiction.oute('subcategory').inv.foreach{c++}.loop(3){true}; c
|
7
|
// is romance at the same depth as history?
|
8
|
c = 0; romance.ine('subcategory').outv.loop(2){c++; !it.equals(literature)}.oute('subcategory').inv.loop(2){c--; !it.equals(history)}; c == 0
|
automated reasoning
from the explicit objects, their relationships, and their abstract
categories, reasoning processes can be enacted. a tension that exists in
graph modeling is what to make explicit (structure) and what to
infer
through traversal (process). the trade-off is between, like much of
computing, space and time. if there exists an edge from a person to
their coauthors, then its a single hop to get from that person to his or
her coauthors. if, on the other hand, coauthors must be inferred
through shared writings, then a multi-hop step is computed to determine
coauthors. reasoning is the process of making what is implicit explicit.
a couple simple reasoning examples are presented below using gremlin.
1
|
// two people who wrote the same book/article/etc. are coauthors
|
2
|
g.v{x = it}.oute('wrote').inv.ine('wrote').outv.except([x])[0].foreach{g.addedge(null, x, it, 'hascoauthor')}
|
4
|
// people who write literature are authors
|
5
|
author = g.addvertex(); author.type='role'; author.name='author'
|
6
|
g.v.foreach{it.oute('wrote').inv[0].foreach{g.addedge(null, it, author, 'hasrole')} >> -1}
|
in the examples above, a full graph analysis is computed to determine
all
coauthors and author roles. however, nothing prevents the evaluation of local inference algorithms.
1
|
// marko's coauthors are those people who wrote the same books/articles/etc. as him
|
2
|
marko.oute('wrote').inv.ine('wrote').outv.except([marko])[0].foreach{g.addedge(null, x, it, 'hascoauthor')}
|
conclusion
graphs are useful for modeling objects, their relationships to each
other, and the conceptual structures wherein which they lie. from this
explicit information, graph query and inference algorithms can be
evaluated to answer questions on the graph and to increase the density
of the explicit knowledge contained within the graph (i.e. increase the
number of vertices and edges). this particular graph usage pattern has
been exploited to a great extent in the world of
rdf
(knowledge representation) and
rdfs
/
owl
(reasoning). the world of rdf/rdfs/owl is primarily constrained to
description logics
(see an argument to the contrary
here
).
description logics are but one piece of the larger field of knowledge
representation and reasoning. there are numerous logics that can be
taken advantage of. in the emerging space of graph databases, the
necessary building blocks exist to support the exploitation of other
logics. moreover, these logics, in some instances, may be used
concurrently within the same graphical structure. to this point, the
reading list below provides a collection of books that explicate
different logics and ideas regarding heterogeneous reasoning. graph
databases provide a green field by which these ideas can be realized.
further reading
brachman, r., levesque, h., “
knowledge representation and reasoning
,” morgan kaufmann, 2004.
wang, p., “
rigid flexibility: the logic of intelligence
,” springer, 2006.
mueller, e.t., “
commonsense reasoning
,” morgan kaufmann, 2006.
minsky, m., “
the society of mind
,” simon & schuster, 1988.
source:
http://markorodriguez.com/2011/02/23/knowledge-representation-and-reasoning-with-graph-databases/
Comments