Object-Relational Algebra: Definitions and Assumptions
Join the DZone community and get the full member experience.
Join For FreeIntroduction
This post begins a series to try to establish a mathematical basis for object-relational database systems. The idea here is to extend relational math in order to cover object-relational database operations rather than to model current databases mathematically. I therefore want to start with my assumptions and definitions.
For simplicity's sake I will follow with Codd's assumptions, namely that
each row is unique, that the ordering is not significant, and that
within each row, the columns are significant. This establishes a
definition of a relation as a set of tuples.
Everything here is then an extension of Codd's math, rather than a
replacement for it. In the programming world, boolean operations may
mean something slightly different, or bags may be used instead of sets,
but the goal here is not to model databases but to model data, and
Codd's model is more elegant in this regard. It is the real world that
gets messy.
To Codd's assumptions I will add my own, namely that the value of any
attribute is opaque to relational operations but may act as the domain
for a function in this way. This becomes important when I get to the
special Σ function and the functional dependencies on its output. The
special function in my system Σ(R) (join series) with a θ condition
gives you a way to represent self-joins of arbitrary depth for
relational math, for example, and combined with functional dependencies
of its output gives you a way to express queries of transitive binary
closure, but that is not its most interesting use at least to me.
A second important point here is that I will diverge from some previous
efforts in that I will not throw in "magic operators" designed to solve
corner cases. The goal here is to build a more expressive system that
can solve more problems. Therefore an operator that determines
transitive binary closure is ruled out, and if we can't solve a problem
without a magic operator, then this will be left for others.
Additionally this effort is intended to be minimalist and allow many
different behaviors to be encapsulated through different approaches.
For this reason I will build directly on the work of Codd and more or
less ignore other approaches that various researchers have taken.
Finally this is somewhat to be distinguished from the way SQL does
things. SQL itself is a messy (from a math standpoint) application of
both relational algebra and predicate calculus, using bags instead of
sets.
One important problem is choosing a name for this extension. The term
object in computer science tends to be a bit abstract and not really a
good description of what is going on here. A better description might
be functional-relational algebra. I call it object-relational algebra
only because it fits in with object-relational databases.
In the posts that follow, relation and function will be given their
ordinary mathematical meanings. However the relationship between the
two are not well defined to my knowledge.
What is an Object Anyway?
In computer science terms, an object is a data structure (here
represented as a tuple) which has imperative behavior attached to it.
Object-oriented programming is thus essentially an extension to
structural programming where behavior follows structure. Classes
define both structure and behavior, and objects instantiate and apply
these rules. Applying a model such as this to an algebra of data is
somewhat problematic, because algebra is about derivation of values, not
about behavior.
What we are doing here is similar, and yet different. I haven't fleshed
out a theory of inheritance vs composition (PostgreSQL supports
multiple inheritance which can be used to implement either inheritance
or composition, strictly speaking), and collection tables can be used to
implement composition in various databases including PostgreSQL. A
theory of composition and inheritance is left to others. From the
standpoint of my model these are equivalent.
Instead of behavior, I use the same basic approach of tying structure to
function in order order to expand the reach of the relational model.
In computer science terms, a class is then roughly similar to a
relation, and an object roughly similar to a tuple. However, instead of
behavior, the typical focus is on derived information. Thus functions
become important in a way which they have not typically been used in
relational math.
Functions of a Relation
So here, f(R) is a function of relvar R if, and only if, for every possible tuple in R, f(R) returns one distinct result (that result could however be a tuple or even a set). Functional dependencies stored in the database can then be modeled as functions, but so can functional dependencies which are not stored.
The following then is not a function: f(R) = x^0.5 because it resolves
to two distinct values for every value of attribute x in relation R (one
is positive and the other is negative). However the following is a
function: f(R) = (abs(x^0.5), -1 * abs(x^0.5)) because it resolves to a
tuple with both possible answers from the previous, at least if x is
always positive or imaginary numbers are allowed. It could return a set
instead and that would be acceptable, however in this case the
structure of the result is also set by the function, and the above
function is distinct from g(R) = { abs(x^0.5), -1 * abs(x^0.5) } because
the structure of the result is different.
In standard relational algebra, a tuple is finitely expressive, namely
one can only express a finite set of values off a single tuple.
However, for any given tuple, an infinite number of functions are
possible, and thus when combined with functions, a tuple becomes
infinitely expressive. Not only can all functional dependencies of the
tuple's superkey be expressed as a function of the tuple, but any
transformation of the tuple's values, or the values of functional
dependencies, can be expressed as such as well.
A functional approach also allows us to dispense with the rename
operation in relational algebra, since renamed relations can be seen as
relation-returning functions.
Kinds of Functions
In my model, functions can be divided up into the following categories:
- Relational Functions can be expressed solely in relational algebra.
- Non-relational functions possess no non-trivial relational algebra reduction. x^2 is non-relational.
- Complex Functions, relationally speaking have non-trivial relational reductions, but non-relational components too.
Assuming sufficient join dependencies in a database, every functional dependency can be expressed through relational functions. Moreover trivial relational dependencies can always be expressed by relational functions, and indeed by mere projection operations. We can then define a trivial relational function as one which reduces solely to project operations off information stored in the tuple.
Result
The resulting system essentially creates something like an
object-oriented database but one which is fundamentally relational, and
in which objects behave differently than they do with mere
object-oriented persistence. While each tuple is infinitely
expressive, this is possible only because of a distinction between
primary (Codd's Algebra) stored data and secondary (calculated)
answers. This extension however allows any kind of mathematics (or
other logic) to be tied into relational algebra. This allows
relational algebra to be used along with many other kinds of disciplines
to build extremely sophisticated data models.
Forthcoming:
1: Exploring properties of relational, non-relational and complex functions
2: The special function Σ(R) representing the output of a simple series join.
Published at DZone with permission of Chris Travers, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
From CPU to Memory: Techniques for Tracking Resource Consumption Over Time
-
How AI Will Change Agile Project Management
-
5 Common Data Structures and Algorithms Used in Machine Learning
-
New ORM Framework for Kotlin
Comments