Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Associative Data Modeling Demystified: Part VI-iii

DZone's Guide to

Associative Data Modeling Demystified: Part VI-iii

R3DM/S3DM is an attempt to unify existing popular data models. The key element to achieve this is the association construct.

Free Resource

RavenDB vs MongoDB: Which is Better? This White Paper compares the two leading NoSQL Document Databases on 9 features to find out which is the best solution for your next project.  

In Part VI-i we introduced R3DM/S3DM and presented its architecture and in Part VI-ii we examined in detail R3DM/S3DM Type Systems. We end our "Associative Data Modeling Demystified" series with the functional operations of R3DM/S3DM and the filtering process.

Functional Operations

The OrientR3S3 Package extends OrientDB RESTful API package in Mathematica with a powerful functional set of commands that cover all operations in R3DM/S3DM. These operations fall in four categories: get, update, delete, and add (GUDA). The main idea here is to define standard named optional arguments for any of these functional categories and then allow their values, default or mandatory, to be given using Wolfram Language transformation rules. We have already started viewing examples of this functional approach to data management with the OR3getAnything command (as seen in Code.2, Code.4, Code.5, Code.7, Code.8, Code.9, Code.10, Code.12, and Code.13).

By varying the named optional arguments, we pass to the OR3getAnything public function of the OrientR3S3 Package (Code.14). We can retrieve or search for anything in R3DM/S3DM and we can also ask to return the result in a specified format (i.e. dataset, references, rules, and associations).

(* Search Operations *)
OR3getAnything[from->"25:02", find->"SPC"]
OR3getAnything[from->"50:01", find->"prtName"]
OR3getAnything[from->"50:02", find->"'Acme Widget Washer'"]

(* Type Systems Operations *)
OR3getAnything[from->"25:19", type->"SYSTEM",      return->"Dataset", debug->True, check->True]
OR3getAnything[from->"25:19", type->"PRIMITIVE",   return->"Dataset", debug->True]
OR3getAnything[from->"25:19", type->"ASSOCIATION", return->"Dataset"]
OR3getAnything[from->"25:19", type->"VALUE",       return->"Dataset"]

OR3getAnything[from->"25:02", type->"MODEL",     return->"References"]
OR3getAnything[from->"25:02", type->"ENTITY",    return->"Dataset"]
OR3getAnything[from->"25:02", type->"ATTRIBUTE", return->"Dataset"]

OR3getAnything[from->"25:01", type->"DATASET"]
OR3getAnything[from->"25:01", type->"TABLE",     return->"Dataset"]
OR3getAnything[from->"25:01", type->"COLUMN",    return->"References"]

(* Mapping Operations *)
OR3getAnything[from->"50:2",         get->"Mapping",    return->"Rules"]
OR3getAnything[from->"50:2",         get->"Mapping",    return->"Bigraph"]
OR3getAnything[from->supplierMRules, get->"Mapping",    return->"AssociationWithValueTypes"]
OR3getAnything[from->supplierMRules, get->"Mapping",    return->"AssociationWithNames"]

(* Items Operations *)
OR3getAnything[from->"70:1", get->"Fields"]

OR3getAnything[from->"50:1", get->"Entities",   return->"Count"]
OR3getAnything[from->"50:5", get->"Entities",   return->"Dataset"]
OR3getAnything[from->"50:2", get->"Attributes", return->"References"]
OR3getAnything[from->"50:2", get->"Model",      return->"Names"]
OR3getAnything[from->"50:2", get->"Attributes", return->"RulesWithNames"]
OR3getAnything[from->"50:2", get->"Attributes", return->"Rules"]

OR3getAnything[from->"70:2", get->"Parts",      return->"Names"]
OR3getAnything[from->"70:2", get->"Parts",      return->"Count"]
OR3getAnything[from->"70:2", get->"Parts",      return->"Dataset"]
OR3getAnything[from->"70:5", get->"Whole",      return->"References"]
OR3getAnything[from->"70:1", get->"Parts",      return->"Rules"]
OR3getAnything[from->"70:1", get->"Parts",      return->"RulesWithNames"]

OR3getAnything[from->"50:3", get->"Tuples", return->"Count"]
OR3getAnything[from->"50:3", get->"Tuples", return->"Rules"]
OR3getAnything[from->"50:3", get->"Tuples", return->"References",    values->True]
OR3getAnything[from->"50:2", get->"Tuples", return->"Dataset",       values->True]
OR3getAnything[from->"50:2", get->"Tuples", return->"Dataset",       values->True, filter->"58:0"]
OR3getAnything[from->"50:2", get->"Tuples", return->"FieldsDataset", values->True]

OR3getAnything[from->"50:3",  get->"Collections", return->"References", values->True]

OR3getAnything[from->"50:11", get->"DataItems",   return->"References", values->True]

OR3getAnything[from->"60:7",  get->"Nexuses",     return->"Count"]
OR3getAnything[from->"60:7",  get->"Nexuses",     return->"References"]
OR3getAnything[from->"60:7",  get->"Nexuses",     return->"Rules"]
OR3getAnything[from->"60:7",  get->"Nexuses",     return->"Association"]
OR3getAnything[from->"60:7",  get->"Nexuses",     return->"AssociationWithFields"]
OR3getAnything[from->"60:7",  get->"Nexuses",     return->"Dataset"]

Code.14: Examples of the OR3getAnything command from OrientR3S3 Wolfram Language package.

We have not reached the development phase of integrating all add operation in an OR3addAnything function of OrientR3S3 Package. Instead of this unified function, we have defined many others such as:

  • OR3addEnvironment (envstruct, envname, usrname, usrpwd)Code13

  • OR3addDomainModel (jsonFileName)Code6

  • OR3addDataSet (datasetName, dataSetPath, fileNames)Code11 

Thus, we have seen how we can add an environment type system, a domain model, and a data source. The last two and those that follow are repetitive actions in data analytics lifecycle, i.e. data sources import, modeling, mapping, ingestion, [filtering and aggregation][].

  • OR3addMapping (columns, attributes, mappingRules)Code15

This is a problem of matching the columns of a table or in general fields of a data set with the attributes of an entity in a data model. We can have fields from multiple data sources that are mapped in the same model. The following code section Code15 demonstrates how we perform the mapping of the columns from the three tables of a data set that we added in Code12. Code.15 demonstrates the mapping of data fields onto attributes in three steps.

  1. Retrieve metadata from the data set and the model.

  2. Specify the mapping rules.

  3. Store the mapping.

Mapping is necessary in order to proceed with the data ingestion. This allows seamless integration of data from multiple data sources on the model we design in R3DM/S3DM.

The following three add operations are used in populating entity and items collections with reference instances and the value type system with data values. Records, i.e. tuples, are assimilated by creating associations (Code16).

  • OR3addValue (className, propertyValue, propertyName)Code16

  • OR3addNexus (className): Code16

  • OR3addEdge (edgeClass, fromItem, toItem)Code16

Code.16 demonstrates the ingestion of three data sources (TSV files). First, we read the headers and body from each file in memory. Then, we get the mapping of columns from the stored dataset onto the attributes of each entity. Finally, we add values, collections items, and associations.

Filtering

It should have become apparent that R3DM/S3DM operates in a different way than other database models. Instead of a data definition language, data manipulation language, or query language (SQL, SPARQL), the functional, uniform set of commands that we have seen in the previous section due to the consistent structure of its type systems and the AIR units makes it easier to manipulate data and elements of the database.

One of the most characteristic features of R3DM/S3DM is the ability to filter data in a seamless standard way instead of accessing data with a query plan which may vary between database models.

This filtering operation is similar to QlikView and we will make a contract with the same data set and example case we used in the fifth article of this series.

Case 1: Condition and Sorting Order

In this case we are looking for the supplier that has the minimum CatalogPrice for a red fire hydrant cap and we want to retrieve supIDsupName, supCitysupCountry,  catPrice, catQuantity, prtIDprtName, and prtColor. The following SQL SELECT query and SPARQL query retrieve a result set using the relational or the RDF data model respectively.

Code.17: SQL query — sort catalog prices for a red fire hydrant cap. 

SELECT suppliers.sid, suppliers.sname, suppliers.scountry,
       catalog.catcost, catalog.catqnt,
       parts.pid, parts.pname, parts.pcolor
FROM   suppliers
       INNER JOIN (parts
                   INNER JOIN [catalog]
                           ON parts.pid = catalog.catpid)
               ON suppliers.sid = catalog.catsid
WHERE  (( ( parts.pid ) = 998 ))
ORDER  BY catalog.catcost;

Code.18: SPARQL query — sort catalog prices for a red fire hydrant cap. 

SELECT ?sup ?supName ?supCountry
       ?catPrice ?catQuantity
       ?prt ?prtName ?prtColor ?cat where
{
    ?prt dc:identifier "998"^^xsd:int .
    ?prt rdf:label ?prtName .
    ?prt schema:color ?prtColor .            

    ?cat wd:hasPart ?prt .
    ?cat schema:cost ?catPrice
    OPTIONAL {?cat schema:quantity ?catQuantity .}

    ?cat wd:hasVendor ?sup .
    ?sup rdf:label ?supName .
    ?sup schema:country ?supCountry
}
ORDER BY ASC(?catPrice)

Code. 19: We have coded two solutions for the testing case using our DBAPIOrientR3S3 package. In the first one (see condensed form in Code.20), catalog tuples are filtered then sorted with a single command! The first tuple in the resulting set is the one we are looking for. Then, we use filtering on part and supplier sets to obtain other information. Finally we project and we join all three tuples deleting any duplicates. In the second solution we start by filtering catalog collections, then we find the datum with the minimum catalog price. Then, we filter catalog, supplier, and part tuples and project catalog, supplier, and part tuples. Then, we project and join the final tuples.

Code.20: R3DM/S3DM Filtering. This is the condensed version of Code.19.

Case 2: Condition and Cleansing

This is a case of detecting and correcting an inaccurate value from a record set that is returned by specifying a condition.

    {SUP, PRT, CAT} = OR3getAnything[from->spcModel, get->"Entities"]
    prtID =           OR3getAnything[from->spcModel, find->"prtID"][[1]]
    fval1 =           OR3getAnything[from->prtID, find->"998"][[1]]


    minCatPriceTuple = SortBy[
             OR3getAnything[from->CAT, get->"Tuples", filter->fval1], OR3getValue@#[[3]] &][[1]]

    catTuple = minCatPriceTuple // OR3getValue
    fval2    = minCatPriceTuple[[1]]

    supTuple=OR3getAnything[from->SUP, get->"Tuples", filter->fval2, values->True,][[1]]

    prtTuple=OR3getAnything[from->PRT, get->"Tuples", filter->fval1, values->True,][[1]]

    prtTuple[[1 ;; 3]]~Join~catTuple[[3 ;; 4]]~Join~supTuple[[{1, 2, 5}]] // DeleteDuplicates

Code. 21: This is an example of cleansing in two phases, detection and correction. Here we can spot the error by filtering collection sets with the condition prtName=Acme Widget Washer. There are two values in prtUnitcollection kg and kb. The second one should be kg; it is a typographical error. We can fix this by deleting the hyperlink that connects the nexus 52:3 (red hyperbond) with the datum kb (green hyperatom) and then adding a new hyperlink from 52:3 to datum 64:1.  

Case 3: Graph Traversal

This is how we perform graph traversal in R3DM/S3DM. Instead of visiting each vertex in the graph, we filter hyperbonds on account of the hyperatoms they share.

Code. 22: We start by specifying a condition, for example, visiting a single hyperatom. This is a green node on the first hypergraph with the label Acme Widget Washer. Now we can get hyperbonds (red nodes) and find all the siblings. This is also a visual representation of part tuples. There are two part items with the description Acme Widget Washer. One has a silver color, the other is red, and both of them weigh 142.88kg. In the same hypergraph, we have also two identifiers (993 and 994), which are prtID values that are shared with catalog entity tuples. Therefore, in the second hypergraph, we have filtered catalog tuples and we have drawn three of them that are related to these two part items. This time there are common hyperatoms, those with a supplier identifier (1081, 1084), that are shared between catalog and supplier.

In the third hypergraph we get a complete 360 degrees view of our case by filtering the supplier entity. Now we can see that there are two suppliers one located in USA that supplies both part items and the other in the UK that supplies only the Silver Acme Widget Washer.

In all three cases, we have demonstrated that there are two ways to present the resulting data set from filtering. We can either return item collections, i.e. sets of values for each attribute we are interested in, or projected tuples from each entity separately or in a consolidated multi-grid flat form.

Summary

We have presented a R3DM/S3DM data modeling framework from a multi-perspective view. The building blocks of conceptual and logical perspective are paired up. Types are separated from instances and semantics are defined and explained with three alternative paradigms object-oriented, network graph, and semiotics. It is the interpreted, encoded (materialized), and representative forms in the triangle of reference that gave R3DM/S3DM its name. This trilateral principle is applied to everything including the architectural design of R3DM/S3DM.

The eight type systems of R3DM/S3DM play the role of a formal upper level ontology and its construction is based on the Neurorganon Upper Level Ontology (NULON). Terms of this ontology describe the framework with a hierarchical organizational structure that defines schemata and types for models, entities, attributes, items, values, data sources, links, and database metadata.

In terms of querying and managing data there is a set of functional operations that match SQL Select, Insert, Update, and Delete statements and there is a filtering mechanism implemented which is equivalent to SQL Where conditions. Thanks to the powerful Wolfram Language transformations we can return results in the form of a table, Rule set, Association set, Dataset, or even a network graph. R3DM/S3DM has also been defined as a functional representation of information resources that are mapped to a materialized form (realization).

Epilogue

R3DM/S3DM is an attempt to unify existing popular data models. The key element to achieve this is the association construct. We have devoted five posts of this series to highlight differences and similarities of Relational, Topic Map, Property Graph, RDF, and Qlik data models that are dependent on associations. And the innovative aspect of R3DM/S3DM is that it avoids namespace, addressing, and identification problems by adopting a uniform representation of everything with numerical vector references.

R3DM/S3DM is a framework designed to construct a Reference Database Management System — it is RDBMS redefined. The first class citizen in this system is the reference. It is not the tuple of an entity-relationship data model, or the triplet of RDF/OWL data model, or even the nodes and edges of the graph data model. R3DM/S3DM is based on Atomic Information Reference units (AIR) and it can assimilate table, column, tuple, key-value, triplet, associations and graph data structures. This kind of granularity of R3DM/S3DM controls also the level of information detail that will be presented to the user, for example, show only hyperbonds, i.e. entity instances.

Information resources are not handled by name in R3DM/S3DM; they are always represented and function as meaningful keys (numerical vectors). We escape from the namespace entanglement and alleviate the complexity of linked data by smart AIR units that are represented in a uniform way. Their digital form can be processed, retrieved and stored efficiently and/or combined to create composite information structures. Where bit is the basic construction unit for data, AIR becomes the fundamental processing unit for structured information. It is now possible to have a giant global graph (GGG) network of information resources based on the power of semiosis with a reference mechanism that is not built with character strings (URL) but it is similar to the IP address.

Do you pay to use your database? What if your database paid you? Learn more with RavenDB.

Topics:
data modeling ,database ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}