An Illustrated Guide to How Data is Stored in NoSQL DB Cubrid RDBMS
Join the DZone community and get the full member experience.
Join For FreeOne of the characteristics of CUBRID that is often discussed as an "extension of relational data model" is its object-oriented model. CUBRID has a lot of object-oriented concepts. All data records are considered as objects which contain records and tables that define the structure are considered as classes that define objects. It is implemented by using the object-oriented concepts and provides the relational model and the SQL (the relational query language) to users. In addition, it provides an "extended relational data modes" such as inheritance between classes, collection data types (SET
, MULTISET
, LIST
) and composition relation.
For the relational data model, it is not allowed that a single column
has multiple values. In CUBRID, however, you can define multiple values
for a column. For this purpose, collection data types are provided in CUBRID. Collection data types are divided into SET
, MULTISET
and LIST
depending on whether the duplication of elements is allowed or not.
Inheritance
Inheritance is a concept that allows reusing in child tables the columns and methods defined in parent tables. CUBRID supports inheritance for reusability. By using inheritance feature provided by CUBRID, you can create a parent table with some common columns and then create child tables inherited from the parent table with some unique columns added. This way, you can model a database minimizing the number of columns needed.
OID
In a relational database, the relation is defined by allowing the referring table to have the primary key of the referred table as a foreign key. If the primary key consists of multiple columns or the size of the key is big, the performance of join operations between tables will degrade.
However, CUBRID allows the direct use of the physical address (OID) where the records of the referred table are located, so you can define relations without using join operations. That is, in an object-oriented database like CUBRID, you can create a composition relation where one record has a reference value to another by using the column displayed in the referred table as a domain (type), instead of referring to the primary key column from the referred table.
Generally, in object-oriented programs objects are the actual data stored in the memory where object pointers are used to point to those objects. Conversely, CUBRID directly handles the database objects, so it cannot express objects. Instead, it issues a unique Object Identifier (OID) for each object. OID indicates the physical address of a database object, the absolute location in the database volume file on the disk. Like a memory pointer that stands for the physical address in the memory area, an OID is the physical address in the database area.
The OID, the physical address of a database object, consists of a volume number (volid
), a page number in the volume (pageid
), and a slot number in the page (slotid
). The following code is an excerpt from CUBRID source code which defined OID.
typedef struct db_identifier DB_IDENTIFIER; struct db_identifier { int pageid; short slotid; short volid; }; typedef DB_IDENTIFIER OID;
Slotted pages
A DB object or a data record in CUBRID is saved in a slotted page, the traditional disk storage structure of an RDBMS. A DBMS is stored on the disk, so disk I/O is made by disk pages (or database pages) in the same way as the operating system does. A size of a database page in CUBRID is 4KB or 16KB, the latter being the default page size (see --db-page-size).
One page includes several data records (or DB objects). Therefore, to obtain a specific record data, the location of the record in a page and the length of the record should be known. The simplest way to obtain the data easily is to arrange records one by one from the start of the page. However, when a new record is created or another record is deleted, the contents and lengths of existing records frequently change. So, we need a way to avoid moving the location of other records whenever a record is changed and to quickly find out the location of the desired record (from which byte the data of the corresponding record in the page is read) while the length of each record is different from each other. For this reason, most DBMSs implement the slotted page structure.
Figure 1: CUBRID Slotted Page Structure.
As shown in Figure 1, one page has several records, and the location of each record is indicated on the slot area at the end of the page. One slot size is 4 bytes, and the slots are numbered from the end of the page as slot 1, slot 2, ..., slot N. Slot 1 indicates the location of a record 1, i.e., the offset in the page, and slot 2 indicates the location of a record 2. In the above figure, slot 6 does not indicate the location of the record (the value is -1) which means that record 6 has been deleted.
As shown in the above figure, records are filled and saved from the
start of the page while slots are filled and saved from the end of the
page. A slot saves the offset along with the size of the record. The
4-byte slot consists of 14-bit offset, 14-bit length, and 4-bit record type.
Finally, the record type is expressed by using 4 bits, and the record
types can be up to 16. There are 7 types of slot pages in CUBRID, and
the types are shown in the storage/slotted_page.h of the source code. (#define REC_HOME 2
is the code that shows the record type.)
As shown in the above code snippet of OID structure, an OID consists of a volume number, a page number, and a slot number. By using a volume number, you can find the file where record is stored. To see which part of a file should be read, a page number is used. The slot number shows where on that page the desired record data is located.
Classes
Like the general object-oriented concept, the structure of DB objects in CUBRID is expressed through classes. It is similar to the data record and table schema of a database. For an object-oriented language such as Java and C++, a class is a frame used to declare the object structure which does not physically exist. However, it is different in CUBRID where a class is also a kind of a DB object. In other words, a class is one of the data records which have certain information just like other DB objects.
- A DB instance object is a record that has the user data.
- A DB class object is also a record which has the data about the structure (table schema) of instance objects (records) that belong to the corresponding class (table).
A table schema data shows columns in the table, the data types of each column, and the table constraints or column constraints defined by the user. In a general relational DBMS, this data is called the schema data or data dictionary. It is saved and managed at a separate space as a special format. However, in CUBRID, it is handled as one of the DB objects according to the object-oriented concept of CUBRID. From the CUBRID source code, you can see that a DB object is handled by distinguishing the object type (class object or instance object).
Root Class
If a class (table schema) is considered as a DB object, then how is the structure of class objects defined? All objects should have their class. In this case, is a separate class required for class objects? The answer is yes. Class objects belong to a class called root class. Therefore, class objects are the instance objects of a root class. In other words, they are the records on the table that is defined as a root class.
Figure 2: Root Class and Class Object in CUBRID.
A root class can be considered as a table that contains the table
definitions. From the SQL standard-defined concept, this table is called
an information schema, generally called a system catalog. To see which tables are defined in the database, a SELECT table_name FROM tables;
query is executed. To see which columns are defined for a table, a SELECT column_name FROM columns WHERE table_name='t1';
query
is executed. The tables or columns tables used here are the information
schema, i.e. system catalog tables. To be exact, the system catalog
tables in CUBRID (e.g., db_class
tables) separately exist and the class objects are separately saved and
managed. From a structural point of view, system catalog tables are
identical to general tables: they are the tables that the system has
created in advance when a database is created. The class objects are
internally used. The record structure (or schema) of a class object is
defined in a source code in advance. So, when the class object is read
from the disk, it is converted to a memory object for interpreting the
contents. The C language structure which defines the memory object
structure of a class object is struct sm_class
in the object/class_object.h file.
Conclusion
This concludes the talk about how data is stored in CUBRID RDBMS? I have explained the individual blocks of the storage like data volumes, pages, and page slots which point to actual data records. Now you know what constitutes an Object Identifier (OID). When directly accessing the data stored on the disk using OID, CUBRID will bypass the recalculation of physical address of a record being requested since you already provide the OID of the record. This provides an added performance. Also in this article you have learned that everything in CUBRID is an object which is either of class object or instance object type. Instance objects are stored in class objects, while class objects are stored in the Root Class.
In the next article I will talk about Data Types and Domains in CUBRID and how you can inherit data types.
Published at DZone with permission of Esen Sagynov, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
Cypress Tutorial: A Comprehensive Guide With Examples and Best Practices
-
Essential Architecture Framework: In the World of Overengineering, Being Essential Is the Answer
-
Best Practices for Securing Infrastructure as Code (Iac) In the DevOps SDLC
-
How Agile Works at Tesla [Video]
Comments