What is NoSQL?
By Dan McCreary and Ann Kelly, authors of Making Sense of NoSQL
One of the challenges with NoSQL is defining it. The term NoSQL is problematic since it doesn't really describe the core themes in the NoSQL movement. In this article, based on chapter 1 of Making Sense of NoSQL, the authors define NoSQL and discuss the business drivers and motivations that make NoSQL so intriguing and popular to organizations today.
The term NoSQL originated from a group in the Bay area who met regularly to talk about common concerns and issues surrounding scalable open source databases and it stuck. Descriptive or not, it seems to be everywhere, in trade press, product descriptions, and conferences. We'll use the term NoSQL as a way of differentiating a system from a traditional relational database management system (RDBMS). So, let's first define NoSQL and then we'll talk about the business drivers behind NoSQL.
For our purpose we define NoSQL in the following way: NoSQL is a set of concepts that allow the rapid and efficient processing of data sets with a focus on performance, reliability, and agility.
Seems like a broad definition right? It doesn't exclude SQL or RDBSM systems right? It's not a mistake. What's important is that we identify the core themes behind NoSQL, what it is and most importantly what it isn't.
- More than rows in tables—NoSQL systems store and retrieve data from many formats; key-value stores, graph databases, column-family (Bigtable) stores, document stores and even rows in tables.
- Free of joins—NoSQL systems allow you to extract your data using simple interfaces without joins.
- Schema free—NoSQL systems allow you to drag-and-drop your data into a folder and then query it without creating an entity-relational model.
- Compatible with many processors—NoSQL systems allow you to store your database on multiple processors and maintain high-speed performance.
- Usable on shared-nothing commodity computers—Most (but not all) NoSQL systems leverage low cost commodity processors that have separate RAM and disk.
- Supportive of linear scalability—NoSQL supports linear scalability; when you add more processors you get a consistent increase in performance.
- Innovative—NoSQL offers options to a single way of storing, retrieving and manipulating data. NoSQL supporters (also known as NoSQLers) have an inclusive attitude about NoSQL and recognize SQL solutions as viable options. To the NoSQL community, NoSQL means not only SQL.
NoSQL is not:
- About the SQL language—The definition of NoSQL is not an application that uses a language other than SQL. SQL as well as other query languages are used with NoSQL databases.
- Not only open source—Although many NoSQL systems have an open source model, commercial products use NOSQL concepts as well as open source initiatives. You can still have an innovative approach to problem solving with a commercial product.
- Not only Big Data—Many, but not all NoSQL applications, are driven by the inability of a current application to efficiently scale when Big Data is an issue. While volume and velocity are important, NoSQL also focuses on variability and agility.
- About cloud computing—Many NoSQL systems reside in the cloud to take advantage of its ability to rapidly scale when the situations dictates. NoSQL systems can run in the cloud as well as in your corporate data center.
- About a clever use of RAM and SSD—Many NoSQL systems focus on the efficient use of RAM or solid-state disks to increase performance. While important, NoSQL systems can run on standard hardware.
- An elite group of products—NoSQL is not an exclusive club with a few products. There are no membership dues or tests required to join.
To be considered a NoSQLer, you only need to convince others you've got innovative solutions to their business problems. NoSQL applications use a variety of data store types (databases). From the simple key-value store which associates a unique key with a value, to graph stores used to associate relationships to document stores used for variable data. Each NoSQL type of data store has unique attributes and uses as identified in table 1. This table shows the four main categories of NoSQL systems, and sample products for each data store type.
Table 1 Types of NoSQL data stores
|Key-value stor—A simple data storage system that uses a key to access a value||
|Column family store—A sparse matrix system that uses a row and column as keys||
|Graph store—For relationship intensive problems||
|Document store—Storing hierarchical data structures directly in the database||
NoSQL systems have unique characteristics and capabilities that can be used alone or in conjunction with your existing systems. Many organizations considering NoSQL systems do so to overcome common issues such as volume, velocity, variability and agility, the business drivers behind the NoSQL movement.
NoSQL business drivers
The scientist-philosopher Thomas Kuhn coined the term paradigm shift to identify a recurring process he observed in science, where innovative ideas came in bursts and impacted the world in non-linear ways. We'll use Kuhn's concept of the paradigm shift as a way to think about and explain the NoSQL movement and the changes in thought patterns, architectures and methods emerging today.
Many organizations supporting single CPU relational systems have come to a crossroad; the needs of their organization are changing. Businesses have found value in rapidly capturing and analyzing large amounts of variable data, and making immediate changes in their business based on the information they receive.
Figure 1 shows how the demands of volume, velocity, variability, and agility play a key role in the emergence of NoSQL solutions. As each of these drivers apply pressure to the single processor relational model, its foundation becomes less stable and in time no longer meets the organizations needs.
Figure 1 The business drivers—volume, velocity, variability, and agility—apply pressure to the single CPU system resulting in the cracks.
Volume and velocity refer to the ability to handle large datasets that arrive quickly. Variability refers to how diverse data types don't fit into structured tables and agility refers to how quickly an organization responds to business change. Let's discuss them in detail
Without a doubt, the key factor pushing organizations to look at alternatives to their current RDBMSs is a need to query Big Data using clusters of commodity processors. Until around 2005, performance concerns were resolved by purchasing faster processors. In time however, the ability to increase processing speed was no longer an option. As chip density increased heat could no longer dissipate fast enough without chip overheating. This phenomenon, known as the PowerWall, forced systems designers to shift their focus from increasing speed on a single chip to using more processors working together. The need to scale out (also known as horizontal scaling), rather than scale up (faster processors), moved organizations from serial to parallel processing where data problems are split into separate paths and sent to separate processors to divide and conquer the work.
While Big Data problems are a consideration for many organizations moving away from RDBMS systems, the ability of a single processor system to rapidly read and write data is also key. Many single processor RDBMS systems are unable to keep up with the demands of real-time inserts and online queries to the database made by public facing web sites. RDBMS systems frequently index many columns of every new row, a process that decreases system performance. When single processors RDBMSs are used as a back end to a web store front, the random bursts in web traffic slow down response for everyone and tuning these systems can be costly when both high read and write throughput is desired.
Companies that want to capture and report on exception data struggle when attempting to use rigid database schema structures imposed by RDBMS systems. For example, if a business unit wants to capture a few custom fields for a particular customer, all customer rows within the database need to store this information even though it doesn't apply. Adding new columns to a RDBMS requires the system be shut down and ALTER TABLE commands to be run. When a database is large, this process can impact system availability, losing time and money in the process.
The most complex part of building applications using RDBMSs is the process of putting data into and getting data out of the database. If your data has nested and repeated subgroups of data structures you need to include an object-relational mapping layer. The responsibility of this layer is to generate the correct combination of INSERT, UPDATE, DELETE and SELECT SQL statements to move object data to and from the RDBMS persistence layer. This process is not simple and is associated with the largest barrier to rapid change when developing new or modifying existing applications.
Generally, object-relational mapping requires experienced software developers who are familiar with object-relational frameworks such as Java Hybernate (or NHibernate for .Net systems). Even with experienced staff, small change requests can cause slowdowns in development and testing schedules.
We see how velocity, volume, variability and agility are the high-level drivers most frequently associated with the NoSQL movement.
This article began with an introduction to the concept of NoSQL and reviewed the core business drivers behind NoSQL movement. We then showed how the PowerWall forced systems designers to use highly parallel processing designs and required a new type of thinking for managing data. We also saw that traditional systems using RDBMS databases require the use of complex object-relational mapping systems and joins to manipulate the data that gets in the way of organization agility.
When we venture into any new technology it is critical to understand that each area has its own patterns of problem solving. These patterns vary dramatically from technology to technology. Making the transition from SQL to NoSQL is no different. NoSQL is a new paradigm and requires a new set of pattern recognition skills, new ways of thinking and new ways of solving problems. It requires a new cognitive style.
Opting to use NoSQL technologies can help organizations gain a competitive edge in their market, making them more agile and better equipped to adapt to changing business conditions. NoSQL approaches that leverage large numbers of commodity processors save companies time and money and increase service reliability.
Here are some other Manning titles you might be interested in:
Neo4j in Action
Redis in Action