NoSQL and NewSQL databases are popular solutions in the data management space. We’re sometimes asked to clarify the difference between the two approaches. Here’s what you need to know if you’re trying to decide which solution to adopt in your organization.
NoSQL is a broad category of disparate database technologies, most of which have come to market in the past decade.
NewSQL is a term coined by 451 Group analyst Matt Aslett to describe a new group of databases that share much of the functionality of traditional SQL relational databases, while offering some of the benefits of NoSQL technologies.
Many enterprises use both solutions — for different use cases.
NoSQL solutions emerged as a reaction to frustration with the cost and inflexibility of legacy RDBMS products like Oracle and IBM DB2, which use SQL as a query language. The original NoSQL systems were built for scale, unstructured data, and did not use relational (table-based) schema. Most early NoSQL solutions dumped SQL as a query language, although the tide is turning and proprietary SQL languages are now offered by many.
Implementing a data management platform that can handle "web-scale" loads — millions of simultaneous users — with "engaging" performance (millisecond response times) can be difficult to do and even more difficult to afford with existing legacy products.
Some NoSQL solutions, therefore, focus on consistency models, i.e. availability: the database is always available to accept new data and can always provide an answer when queried, even if that data is not transactionally-consistent, i.e., the most recent version written. Examples include DynamoDB and Cassandra.
Other NoSQL databases are designed to be flexible, and focus on data models: they don’t enforce a rigid or consistent schema across stored data. These ‘document stores’ expand upon the traditional key-value store by replacing the values with JSON-structured documents, each able to contain sub-keys and sub-values, arrays of value, or hierarchies of all of the above. There are many document and column-oriented NoSQL databases, e.g. MongoDB, HBase, and Couchbase.
Alternate Data Models in NoSQL Offerings
Lucene, Solr, and ElasticSearch offer text and document indexing functions, for example, to implement real-time search as users enter terms.
Graph databases like Neo4J, Titan, and Tagged organize data by relationships instead of by row or document, enabling powerful traversal and graph query capabilities.
The Promise of NewSQL
NewSQL systems offer the best of both worlds: The relational data model and ACID transactional consistency of traditional operational databases; the familiarity and interactivity of SQL; and the scalability and speed of NoSQL. Some offer stronger consistency guarantees than are available with NoSQL solutions, although others limit this to ‘tunable’ consistency and thus aren’t fully ACID-compliant.
Of course, there are differences among NewSQL solutions. SAP HANA handles modest transactional workloads, but without the benefit of native clustering. NuoDB is a cluster-first SQL database with a focus on cloud deployments, but throughput is poor. MemSQL is useful for clustered analytics but its tunable consistency falls down on ACID transactions.
Ideally, NewSQL systems like VoltDB combine real-time analytics on inbound data feeds with strong ACID transactions, native clustering, and Hadoop ecosystem support. This allows them to be the system-of-record for data-intensive applications while offering an integrated high-throughput, low-latency ingestion engine. It’s a great choice for policy enforcement, personalization, fraud/anomaly detection, or other request-response style fast-decisioning apps.
The default of speaking in terms of categories (SQL, NoSQL, NewSQL) vs. use cases and problems makes it hard to choose the best solution. Knowing how to choose, based on how each offering helps (or hinders) problem solvers, is the key to success.