Over a million developers have joined DZone.

Associative Data Modeling Demystified: Part VI-i

DZone's Guide to

Associative Data Modeling Demystified: Part VI-i

As we near the end of this series, the R3DM/S3DM framework is introduced with a demonstration of a fully functional prototype built in the OrientDB multi-model DBMS with Wolfram programming language.

Free Resource

RavenDB vs MongoDB: Which is Better? This White Paper compares the two leading NoSQL Document Databases on 9 features to find out which is the best solution for your next project.  

It is remarkable how we've turned an electronic device that only processes 1s and 0s into an inseparable intelligent companion and trustworthy assistant. The key behind this imaginative use of computers is the captivating abstract thinking process of the human brain. The R3DM/S3DM conceptual and logical framework is an attempt to model databases with the very same intimate mechanism that creates models. In this endeavor, there could not be a better theory for the base of R3DM/S3DM other than Aristotle’s theory on semiosis. Semiotics is the study of meaning-making and it binds semantics with symbolic representation and transformation, which is the bread and butter of computer programs and digital storage. R3DM/S3DM is not only conceived along this theory; it is also implemented with those semiotic principles in mind.

R3DM —  Representation (Resource, Realization) or S3DM  Sign (Signified, Signifier) — is a computational semiotic framework with a mathematical morphism that formalizes the architectural design of associative  hypergraph databases.

Following this definition, we will unfold R3DM/S3DM and explain its main characteristics starting with the classic three-layered database architecture.

Architecture Overview

One of the main purposes of Zachman’s conceptual, logical, and physical database design is to provide data independence at the application-user level. The three layers are in descending levels of abstraction where the conceptual model is the most abstract and the physical data model the least abstract or most concrete.

The conceptual model usually refers to the domain of discourse and describes the semantics of the application without any reference to the database technology. On the contrary, the logical data model implements the concept model in terms of abstract data types (i.e. List, Set, Map, Graph). In the following list, you can see the correspondence between these two layers for five popular data models.

Conceptual schema:

  1. Predicates
  2. Entities and relationships
  3. Topics, associations, and occurrences
  4. Subjects, objects, and predicates
  5. Classes and properties

Logical structure:

  1. N-ary relation, tuples, and attributes
  2. Table, rows, and columns
  3. Hyperedges and hypernodes
  4. Nodes, edges, and properties
  5. Objects

Both conceptual and logical layers should act independently of the underlying database engine, i.e. physical data model. The following is an indicative list of what is normally included in this layer.

Physical data storage organization:

  1. Data orientation (rows/columns, correlational)
  2. Data structure
  3. Object storage
  4. Block storage
  5. Data cluster
  6. Database index
  7. Serialization

Conceptual Perspective

Regarding the conceptual data model, R3DM/S3DM uses terms that are well-known among database experts, i.e. entities and attributes. In the Code.1 segment, we can view instances of the supplier, part, and catalog entities and attributes that describe them. Remember that in R3DM/S3DM, entities and attributes play the role of abstract concepts that we associate to create models of our data; they are not containers or instances of data.

  • An entity is something that has a discrete, independent existence — for example, the Eiffel Tower (building), Apple, Inc. (company), or a Porsche 993 GT2 with a specific VIN (car).
  • An attribute is a piece of information that describes an entity — for example (referring to the above entities), 300 meters (height), U.S. $215.639 (revenue), or WP0ZZZ99ZTS392124 (VIN).
  • An association represents:
    1. An N-ary relation of an entity with its attributes (see Code.1).
      • For example, Part{ID, Description, Color, Weight} = Part998 {998, Fire Hydrant Cap, Red, 7.2}.
    2. An N-ary relationship between one or more entities sharing one or more common attributes that is defined by the roles they play in the association (see Code.1).
      • For example, Film {StarringActor1, StarringActor2, Director} = FilmID {ActorID1, ActorID2, DirectorID}.

Code.1 has result sets from SQL queries on the supplier, part, and catalog tables. The same result sets are drawn in Figure 1 with a hypergraph. In the Code.2, segment they are assimilated with AIR units in associations.

Logical Perspective

Changing now our perspective with a focus on the logical building blocks, R3DM/S3DM can be viewed as a hypergraph, as seen in Figure 1, comprised of three data structures: hyperatoms (hypernodes), hyperbonds (hyperedges), and hyperlinks (edges).

  • A hyperbond graphically represents a complex data structure (for example, tuple, JSON object). The role of a hyperbond is to connect a set of hyperatoms in order to form associations.
  • A hyperatom graphically represents an atomic data item (for example, record value, a key-value pair of a JSON object).
  • A hyperlink, graphically speaking, is a graph edge that bidirectionally connects a hyperatom to a hyperbond.

Figure 1: A hypergraph of suppliers, parts, and catalogs for Part No. 998 with its four catalog entries and its four suppliers. Hyperedges are in green and hyperatoms are in red.

Instances Perspective

Entities and attributes in R3DM/S3DM are types and represent uniquely a single set of instances also known as items (see also Items Type System). Entities and attributes can be thought as references to collections, as seen in Figure 2.

Figure 2: The meta level and domain level. Domain abstractions and specializations are abstract types (like a person, credit card, or item) at the instance level. The instance level includes domain-particular instances, like Tom who purchased Item ZZZZ with Credit Card No: XXXX.

  • A collection (set of instances) is a generic container for items with no duplicates. A collection can have one or more representative concepts (entities or attributes). We have two types of collections: data and nexus.
  • A datum item (datum) can be thought as an instance of a particular attribute type that points to a single atomic piece of data (atomic value). A datum collection contains datum items (data). In our hypergraph perspective, datum items are represented by hyperatoms.
  • A nexus item (nexus) can be thought as an instance of a particular entity type with a role of associating/binding together datum items. A nexus collection is a type of collection that holds nexus items (nexuses). The graphic equivalent of a nexus item is a hyperbond.

It's common to consider a type, i.e. class, as a container of its instances. But that is not the case in R3DM/S3DM, where abstract concepts (types) have an independent existence and refer to collections, i.e. the containers of instances.

This separation between containers of items (instances) and abstract concepts (types) is extremely important, as it decouples the data modeling layer from the data collections that are ingested into the database.

Semantic Perspective

Data (like names, codes, quantities, time, location, and categories) is meaningless without context. Data inherits more meaning when processed within a context. This is exactly the purpose of semantic data modeling, a data modeling technique to define the meaning of data within the context of its interrelationships with other data. Semantic models can be either fact-oriented (RDF triples) or object-oriented (entities and relationships).

The disadvantage of the second is that you have to manage dissimilar 2D structures (tables) that are dependent of a fixed database schema and not connected or related directly. The drawbacks of the first are the labeled edges, the modeling of n-ary relations, the inseparable mixture of plain and typed literal triples that represent values with RDF links that represent resources, and the Semantic Web Identity Crisis...to name a few.

R3DM/S3DM assimilates both fact- and object-oriented views by defining an atomic information reference unit based on semiotics. Naturally, with this solution, we escape from many of the above problems. This is one of the most innovative aspects of this framework.

Object-Oriented View

The most commercially successful semantic model is the entity-relationship data model. In the first post of this series, we discussed the conceptual data model that Chen is using to represent the tuples of series. We discussed the conceptual data model that Chen is using to represent the tuples of the relational data model. Figure 2 and Figure 3 show that Chen is using either entity sets, attributes, and value sets or entity set(s), relationship set(s), attributes, and value sets to form an association.

The key point here is that entity sets and attributes, in both cases, are separated from the value sets. Indeed, this is the design principle that is followed in any modern relational DBMS. There is a data dictionary, also known as a metadata repository or metadata registry, that stores (among other things) names and descriptions of entity sets, relationships, and their attributes that construct a database schema.

Semantically speaking, the database schema and its metadata describe the meaning of its instances, i.e. entity relations, entity relationships, and attribute value sets. For this purpose, in the current OrientDB implementation of R3DM/S3DM, each one of these sets is defined explicitly and is represented with an OrientDB Class. There is another reason we keep separate the actual data values. R3DM/S3DM uses a single instance value-based storage. Each unique value in the raw data is stored only once. With this feature, there is some resemblance with the data model of the correlation database.

Network Graph View

While it is helpful to view the higher-level type system architecture of R3DM/S3DM through an object-oriented filter, it is important to understand that at a low-level R3DM/S3DM consists of nodes and edges. In particular, the prototype framework we describe in this article is built on top of the OrientDB graph engine with lightweight edges and a hypergraph structure (we'll see this more in future posts).

In previous posts of this series, we made a comparison of the association construct with relational tuples, topic map association, RDF triplet, property graph nodes and edges, and Qlik binary-coded records. Such semantic models, with the exception of Qlik technology, are fact-oriented and semantics are typically expressed by binary or n-ary relations between data elements. In R3DM/S3DM, the graph is usually undirected with symmetric and typed binary relations between the hyperbond and the hyperatom.

This low-level graph view of the system can be implemented in many ways. For example, you can have two constructs, like tables — one for the nodes and another for the edges (see the work of Simon Williams in Sentences database) — or you can use a key-value store that saves tuples (Graphd, the back store of Freebase), or you can also have a native triple store.

Semiotic View

So far, we have seen how we can contextualize data using the association construct. This is the mechanism to assimilate tuples of data. Nevertheless, values in a tuple or literals/resources in a triple are meaningless in isolation. In the first case, you need either the head and the type of the relation (table and column names). In the second case, you need the label and direction on the edge (predicate) that connects the subject and the object to give meaning in the binary relation. To quote Ron Everett:

Every table is a silo. Every cell is an atom of data with no awareness of its contexts, or how it fits in to anything beyond its cell. It can be located by external intelligence but on its own it’s a “dumb” participant in the system — the ultimate disconnected micro-fragment accessible only by knowing the column and the record it exists in.

He also says:

The alternative is to replace the data elements with information at the atomic level of the system. Instead of a data atom in a table, we have an information atom with no table.

Therefore, the trick here is to build associations based on a uniform representation of its members and the roles they play in a similar way to topic map association items. For this purpose, we have introduced Atomic Information Resource (AIR) unit in the previous post of this series. Now, we will view AIR with more detail and in action. For each AIR unit, we maintain a record of information. For simplicity and for demonstration purposes, the AIR unit in the current implementation of R3DM/S3DM is equivalent to OrientDB Record ID (RID).

For example, the supplier result set in Code.1 is transformed to an associative set and each cell of the supplier table is represented with a RID (Code.2 — get supplier associative set). Columns of the part table, like PID (Code.1), and any of its values, are also represented with RIDs (Code.2 — get datum where Parts.pid=998). The single part tuple where pid=990 (Code.1) is considered to be an instance of an entity and also has a RID (Code.2 — get tuples that this datum is part of).

This way, AIR information representation serves two principal functions: information resource identification and location addressing, i.e. dereferencing and retrieval.

Code.2 associative sets are presented with values or in RID (reference key) format. The equivalent result sets are drawn in the hypergraph of Figure 1 and fetched with SQL in (Code.1) segment. The document record in OrientDB with RID #60:7 is an instance of the prtID Attribute collection. We can read the datum value, find which attribute collection (class) it belongs to, and get its siblings (i.e. other datum items of the class). In the same datum record, we can see its row context associates, i.e. nexus items. These are the five tuples it participates, one part relation (#52:7) and four catalog relationships (#53:7, #53:11, #53:12, #53:16). Click here to view the code frame.

There are two steps towards this transformation of tuples. First, we create a value type system, i.e. a place where we store atomic values based on their type. Second, we apply a uniform representation on everything, i.e. data and metadata. This turns our DBMS to a Reference Database Management System (RDBMS). Remember that deep down, at an atomic level, we store single instance values. It is only the reference keys to those values that we manage. This enables a cellular granularity on R3DM/S3DM. Metadata with high granularity allow for deeper, more detailed, and more structured information and enables greater levels of technical manipulation.

This uniform representation and management of abstract information resources (models, data sources, metadata) with AIR units in R3DM is the realization that underneath, there is a separate storage layer of single instance data values.

S3DM framework is based on the powerful theory of the semiotic triangle, also known as the triangle of meaning or the triangle of reference. We use key references (signs-symbols) to represent abstract things (signified concepts) in our mind. We encode these into data containers, i.e. forms that the sign takes, for the storage of data values (signifiers), Figure 3.

Figure 3: R3DM/S3DM triangle of meaning/semiotic triangle/triangle of reference.

This trilateral principle permits a uniform treatment of semantics, syntax, storage, and structure of information based on symbolic representation. The very same principle is applied to the architectural design of R3DM/S3DM type system.

Continue reading on next section : Environment Type Systems

Get comfortable using NoSQL in a free, self-directed learning course provided by RavenDB. Learn to create fully-functional real-world programs on NoSQL Databases. Register today.

data modeling ,architecture ,tutorial ,database ,rdbms ,r3dm ,s3dm

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}