Over a million developers have joined DZone.

Associative Data Modeling Demystified: Part VI-ii

DZone's Guide to

Associative Data Modeling Demystified: Part VI-ii

Check out some code samples and detailed figures to understand different types of database systems and associative data modeling.

Free Resource

Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.  

In Part VI-i we introduced R3DM/S3DM and presented its architecture. We continue with a detailed examination of R3DM/S3DMType Systems.

Environment Type Systems

R3DM/S3DM can be seen from many perspectives. At a lower level, we have a network graph of nodes and edges. In the OrientDB graph model, these are instances of the V (for "vertex") and E (for "edges") classes. At a higher level, based on those two classes, we build a hierarchically organized namespace in OrientDB's paginated local storage (plocal). This is the database environment, the highest organizational structure, a logical container that may represent a business environment, a user environment, or even a programming environment. Its role is to group related classes into eight type systems. Seven of them manage vertices and one of them is for the different types of edges, as shown in Figure 4. The process of initializing and defining the structures in OrientDB database is fully automated by reading the schema of the type system from a JSON file, as shown in Figure 5.

Figure 4: Information about R3DM/S3DM hierarchical type system stored in the *DBRegistry* subsystem.

Figure 5: Schema of R3DM/S3DM hierarchical type system in a JSON format. For simplicity, in the current implementation, Items Type System (ITS) is not present. Entity Type System (ETS) and Attribute Type System (ATS) OrientDB classes compose this missing system.

Our DBAPI project in Wolfram Language has been extended to cover all R3DM/S3DM functionality. For example, an environment is added with the OR3addEnvironment command, (Code.3) and we retrieve any metadata about it with the OR3getAnything command (Code.4).


In Code.3, we pass the schema file (JSON), the name of the database (R3DB), and the username and the password. Metadata for the newly created classes are stored in the DBRegistry subsystem in Figure 4.

Code.4 involves the retrieval of environment system types with the OR3getAnything command. By default, it is returning a list of references (RIDs) unless we specify a specific format for the structure (like a dataset).

Database Type System (DBS)

Classes that start with the letters DB such as DBLog, DBUser, DBOperation, and DBUserRole are used to store the user environment metadata useful for administrative, security, and monitoring purposes (Code.5).

Code.5 shows the retrieval of primitive database types with the OR3getAnything command as a list of references or as a dataset.

We have already seen the DBRegistry subsystem where information about the schema of Environment is stored (Figure 4). Another class, DBObject , is used for casting multi-type role playing at a node (Object is an instance of a Type).

Model Type System (MTS)

Next, we will examine how we can add a new domain model in the system. Usually, at this stage, the database expert draws the entity-relationship diagram of the model, which is a graph of entities, attributes, and relationships. In Figure 6, we present a similar diagram for our Supplier-Part-Catalog data set and in Figure 9, we have drawn the same diagram in the OrientDB Graph Editor. The main difference with ER diagrams is that relationships/relations here have a direct representation in the system with the edges (R3DM/S3DM links) and dictionary metadata is explicitly defined and stored in MTS.

Figure 6: The schema diagram of the SupplierPartCatalog model (red). Entities (blue boxes) and Attributes (green ovals) are written with their full names (blue) and short names (purple) that take positions outside each shape. Entity boxes inside carry their type (red) and their super-type (blue). Inside each attribute, oval super-type (black) and value type (red) are written. LTS_EA (blue) connects entities with attributes and LTS_WP (brown) is linking the model with its entities.

We have serialized the domain model graph of Figure 6 in the same manner as the schema of the type system and the task of loading and building the model is simplified again with the following single function call (Code.6).


In Code.6, the only parameter required in OR3addDomainModel is the schema definition file of the model. This function call will instruct OrientDB to create classes for all entities and attributes of the model. All the metadata from the schema, Figure 6, will be stored in MTS and will be available for retrieval with OR3getAnythingcommand.

Using the powerful OR3getAnything function in the context of the DBAPI OrientR3S3 package we can retrieve metadata (Figure 6) for any model, entity, or attribute. The result set can be in the form of the Wolfram Language dataset, list, or graph data structures (Code.7).

Code.7 shows three examples of the OR3getAnything function where we retrieve OrientDB records for models, entities, and attributes respectively. The function call is translated to OrientDB RESTful API command and the JSON response from the DBMS is transformed to Wolfram Language data structures for further processing and visualization. It worths noticing the hypergraph representation which is the equivalent graphical form of the model’s schema diagram in Figure 6.

To maintain compatibility and interoperability with ontologies and other vocabularies for structured data on the Internet (such as schema.org), we categorize entities and attributes of any model according to OrientDB classes that typically form the hierarchical structure of entity type systems and attribute type systems.

Entity Type System (ETS)

Entity types usually refer to abstract entity types. They are used to create schema templates for popular entity types. Metadata are kept in DB_Registry and as usual, they can be retrieved with the OR3getAnything command (Code.8).

Code.8 shows the retrieval of entity types with the OR3getAnything command from DB_Registry. For each entity type, we can view its type and its super-type as well as the name and short name that can be accessed.

As an example consider the schema diagram of Figure 6 where we have three entity subtypes — organization, product, and data catalog — where we classify the supplier, product, and catalog entity collections respectively (see Code.7, Get Entities section).

Attribute Type System (ATS)

Similarly, attribute types refer to abstract attribute types. Each attribute can participate in one or more entity types, and that is how schema templates are formed. In our Supplier-Product-Catalogue data model (Figure 6), attributes can be classified as:

  • Categorical (Supplier_status, Part_color, Part_unit, Catalog_check)

  • Identity (Supplier_id, Part_id)

  • Name (Part_nameEN, Supplier_nameEN, Supplier_city, Supplier_country)

  • Quantity (Part_weight, Catalog_price)

  • Count (Catalog_total)

  • Time (Catalog_date)

  • Description (Supplier_address)

The following OR3getAnything command demonstrates how we can draw metadata about this type system (Code.9).

Code.9 shows the retrieval of attribute types with the OR3getAnything command from DB_Registry. For each attribute type, we can view its type and its super-type as well as the name and short name that can be accessed.

Each of these attribute classes above (see Code.7, Get Attributes section) refers to an attribute collection of data items that has a value type. For example, Supplier_status, Part_id, Catalog_total, and Supplier_id are of value type INT (integer) and Part_weightis of value type REA (real).

Items Type System (ITS)

This is the system of items collections, i.e. where entity and attribute reference instances are stored (see Instances Perspective). In our demo, these are represented with OrientDB classes (Figure 7).

Figure 7: Two select SQL queries with OrientDB Studio Manager. The first returns four entity records and the second returns four attribute records. Each entity and attribute record is represented by an item from a collection. These items, in turn, are represented by a reference key which is OrientDB @rid.

For simplicity’s sake, we have not constructed ITS separately, but we used OrientDB classes instead to populate them with instances of entity and attribute types. Nevertheless, the idea is the same, segregate the abstract concepts (types) from the domain particular instances.

Value Type System (VTS)

According to our semiotic view (Figure 3 in the previous post), key references (RIDs) represent resources from DBS, MTS, ETS, ATS, ITS, LTS, DSS, and encoded form of data, i.e. the realization of data values. These values have domains that are based on OrientDB primitive data types; for example, IntegerDoubleDateTime, String, and types that are defined in the Value Type System (VTS) (see Code.10).

Code.10 shows the retrieval of value types with the OR3getAnything command from the DB_Registry. For each attribute type, we can view its RID, type, super-type, name, shortname and the OrientDB data type that is based on.

Data values are stored together in ordered sets based on Value Types, e.g. all integers in one set (VTS_Integer), identifiers in another (VTS_Encoded), and so on. This optimizes the access and management of values. Each Item Collection is a subset of values, i.e. a subset of a value set (Figure 8).

Figure 8: A subset of the VTS_Integer value set. This record set is obtained with OrientDB SQL select command. Each one of these records is pointed from a datum item of the Supplier_id collection (see Figure 7) using the OrientDB Link type.

Elements of both Value Sets and Item Collections are unique, i.e. a single instance of each one. In OrientDB this is implemented by setting SB-Tree index with UNIQUE keys in both Value set and Item Collection.

Data Sources Type System (DSS)

Yet there is another type system in the R3DM/S3DM framework that is reserved for the input of data sets, the data sources type system (DSS). A data set is considered to be a number of related collections that usually correspond to the contents of flat files or database tables. During the process of adding a data set, we do not store the values but only the structure and properties (Code.11).

Code.11 shows a demonstration of the OR3addDataSet command with three parameters, the name of the new data set that will be created, the working path, and the filenames of the flat files to inspect. This function call will instruct OrientDB to create a new class to store information about the structure of these flat files. These metadata will be available for retrieval with OR3getAnything command (shown in Code.12, which has the retrieval of a dataset structure, e.g. name, short name, type, supertype, path, table names, table types, column names, etc., with the OR3getAnything command from the DSS system).

Link Type System (LTS)

Last but not least, the aforementioned symmetric and typed binary relation is a bidirectional link, i.e. hyperlink, that is used to connect a hyperatom to a hyperbond to form associations. We have two kinds of hyperlinks: directed and undirected. Directed links are used to form directed associations of a specific type, e.g. entity-attribute, whole-part, type instance (as shown in Figure 9). Depending on which direction we traverse the link (outgoing or incoming), we have two different labels to assist us in reading and understanding the semantics of the binary relation. In all cases, metadata about the Link Type System (LTS) are currently stored in DB_Registry and as usual, we can retrieve link types with the OR3getAnything command (Code.13).

Code.13 shows the retrieval of link types with the OR3getAnything command from DB_Registry. For each ASSOCIATION type, we can view its supertype, name, short name, and the labels for outgoing or incoming traversal direction.

Figure 9: Schema diagram of Supplier-Part-Catalog model in OrientDB Graph Editor. Red edges connect entities to attributes and green edge connect the Model to Entities. We can also view the type of R3DM/S3DM link in LTS system as a label on the edge.

Continue reading on next section : Functional Operations

Download AnzoGraph now and find out for yourself why it is acknowledged as the most complete all-in-one data warehouse for BI style and graph analytics.  

database ,tutorial ,data modeling ,type systems

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}