An Introduction to DBMS Types
This short article introduces RDBMS and NoSQL DBMS types and discusses the difference between them.
Join the DZone community and get the full member experience.Join For Free
This article will be of interest to those learning about databases who don’t have much prior knowledge of the different types or of the terminology involved. It provides a brief review and links to further information, which I hope is useful to anyone starting to work with databases. If you’re already a wizard with SQL or Neo4j, you won’t need to read any further!.
If you're still with us, let's first introduce some terminology. A database collects and organizes data, while access to that data is typically via a “database management system” (DBMS), which manages how the data is organized within the database. This article discusses some of the ways a DBMS may organize data. It reviews the difference between relational database management systems (RDBMS) and NoSQL.
Relational database management systems store structured data in tables consisting of rows and columns. A row contains all the information about an entity, whereas columns represent separate attributes. For example, you may have a row about a customer, in which the columns are the first and last names, address, city and email address.
RDBMS needs data to be strictly structured. The nature of each column (the database schema) must be fixed before data is put into the database. Changes can be introduced later, but if you decide you want to add a column for the customer’s phone number, you need to migrate the database to a new schema that includes that extra column. If the database is large, this will be a slow process and cause database downtime, which means service interruptions for the customers.
In RDBMS, related data can be stored in separate tables and joined together when a more complex query is required. For example, customers could be stored in one table, and orders in another. To make a query that lists all the orders a customer has made, the database engine joins the customer and orders tables together to get the required information.
A typical RDBMS guarantees the ACID properties with respect to database transactions. ACID stands for Atomicity, Consistency, Isolation, and Durability, which I’ll explain.
Database transactions may consist of multiple statements, and atomicity is a guarantee that every transaction either succeeds completely or fails completely such that the whole transaction is rolled back so the database is unchanged from before the start of the transaction. Atomicity is a guarantee in the face of power failures, errors and crashes.
Consistency is the guarantee that a transaction doesn’t invalidate the state of the database. Any data written to the database must be valid to prevent corruption.
Database transactions may execute concurrently, that is, at once. Isolation guarantees that all concurrent execution leaves the database in the same state as if the transactions were sequential.
Durability guarantees that once a transaction has been committed, it will not be rolled back, even in the face of a power failure or crash.
Putting this together, once an ACID-compliant transaction is complete, its data is guaranteed to be consistent and stable.
RDBMS are usually scaled vertically by using a larger server. A single server is typically needed to host the entire database to give acceptable performance for cross-table joins and transactions. That’s not to say that "sharding" a database across servers is impossible. But, it is usually complex because it requires making the hardware appear to act as a single server, and many of the benefits of having a relational database, such as transactional integrity, are compromised as a result.
The term NoSQL (“Not Only SQL”) covers a number of different types of databases, each with different data storage models: key-value store, document, graph or wide-column store. NoSQL models often have less structure and fewer transactional guarantees than RDBMS, but offer better scalability. NoSQL can be scaled horizontally, across servers, allowing data and query loads to be balanced automatically.
NoSQL databases do not need a schema defined in advance. That makes it easier to add information on the fly. It also means that each ‘row’ (or equivalent) can be flexible and do not have to contain data for each "column." NoSQL databases came about in the late 2000’s to cope with large volumes of data whose schema regularly changed and which was not sufficiently structured for RDBMS.
Most NoSQL systems typically maintain multiple copies of data to provide availability and to scale effectively. This results in different guarantees of data consistency. In NoSQL, ACID support for transactions is considered secondary and is not guaranteed, in favor of speed and greater scale.
Types of NoSQL Databases
This section briefly reviews the different types of NoSQL databases. If you are interested in finding out more, it’s worth taking a look at the DB-Engines Ranking website, which provides a monthly popularity ranking of a number of products, and gives further details about them.
This is the simplest NoSQL database: each item is stored as a name (or ‘key’), together with its value. Key-value stores function like SQL databases, but have only two columns (the key and the value). In general, both the key and the value are untyped, raw byte sequences. Needless to say, these are the most unstructured NoSQL databases and they are frequently used as the foundation of other, more structured models.
This kind of database pairs each key with a complex data structure: a document, which removes the column and row model of traditional databases completely. A document is typically JSON, XML or similar, and can contain many different key-value pairs, key-array pairs, and even nest other documents.
Fundamentally, this storage is like a key-value store, except that sometimes the keys are generated automatically by the system and the values follow a syntactic structure. But the distinctive feature is their inbuilt secondary indices on the values, providing structured querying.
This kind of NoSQL database is optimized for querying over a large amount of data. Columns of data are stored together (instead of rows). An example customer database stores customer ID, first name, last name, email address.
In a row-based database, data would be stored as follows:
1: Alice, Smith, firstname.lastname@example.org
2: Bob, Jones, email@example.com
3: Colin, Green, firstname.lastname@example.org
The column store would store data as
1: Alice, 2: Bob, 3: Colin
1: Smith, 2: Jones, 3: Green
1: email@example.com, 2: firstname.lastname@example.org, 3: email@example.com
In these stores, the information about a given entity (row) can grow indefinitely and new columns may be added easily. Querying focusses on a particular aspect of the data, e.g. names or address of people. A column store allows for faster data summarization along specific dimensions. This type of database is used in big data analysis.
A graph database is a collection of entities or “nodes”, and the “edges” that connect them that represent relationships between the nodes. Every node is defined uniquely by an identifier. Incoming and/or outgoing edges and properties are given as key-value pairs. Each edge is also defined by a unique identifier, the nodes it connects and a set of key-value properties.
The focus in graph databases is the relationship between entities, which avoids the use of foreign keys or MapReduce to determine connections between them. Graph databases are excellent for navigating a relationship structure quickly as well as determining patterns of relationships. The kinds of analysis that one typically performs are about structure, for example, the number of steps needed to get from one node to another. Graph databases are becoming very popular for large online systems with deep links between records.
RDBMS or NoSQL?
Phew! That was a fire hose of information about databases, and it covered a lot of ground in a few paragraphs. Wikipedia is a great reference to find out more, as are websites for products of each database type, found on DB Engines.
The takeaway from the above is that RDBMS and NoSQL both do the same thing — they store data, but do so in different ways according to the requirements upon them. NoSQL is not a replacement for RDBMS but an alternative.
Some projects are better suited to using one type or the other.
RDBMS is a good choice where data is structured and the schema is identified and fixed up front, and when data integrity and concurrency is vital.
RDBMS is not ideal when there is frequent change to the schema, nor is it efficient for exploring relationships between data elements.
NoSQL is a better choice when the data requirements are evolving and/or speed or scalability are essential: document databases are good for unstructured data and, if the goal is to model and analyze data interconnections, a graph database is a recommended solution.
Furthermore, some SQL databases are adopting NoSQL characteristics such as sharding, and vice versa. NewSQL is blurring the lines, and hybrid models such as combining a graph database with RDBMS are also an option.
I hope this article gave you a taste of the terminology and characteristics involved with DBMS used for big data storage. Don’t forget to hit me up in the comments with any feedback!
Opinions expressed by DZone contributors are their own.