Over a million developers have joined DZone.

The State of Databases 2019

DZone 's Guide to

The State of Databases 2019

Cassandra 4.0 will debunk the CAP theorem.

· Database Zone ·
Free Resource

I had the opportunity to hear Dinesh Joshi,  Senior Software Engineer and Architect at Apple and an active member of the Apache Software Foundation share his thoughts on The State of Databases 2019 while attending Percona Live in Austin, Texas.

Data is growing and will continue to do so. In 2019, humans will generate 40 zettabytes of data. Data continues to grow in importance from a business standpoint. Things like flight systems data for airlines and electronic medical records for hospitals need to be backed up, protected, and available on a moments notice.

The data growth fuel is being fueled by embedded devices, Fitbits, watches, phones, IoT devices, sensors, wearables, and more. All these are generating tons of time series data. And while businesses want to analyze everything they can't and they're having trouble deciding what data to use.

Consequently, the database landscape in 2019 is sprawling.  Dinesh stopped counting at 380 databases. Operators and developers have a tense relationship. Operators deploy and manage multiple databases while developers just want to use what fits their use case. Each wants different features and while there is some overlap, the two groups are rarely aligned.

Cascading costs can become unmanageable. Developers do not typically consider the cost of the database. They typically start with a database, then add the access layer, services (Rest, GRPC), and UI/presentation. If you need to change the layer on top it’s typically not too expensive or time-consuming. When you change the database the cost cascades. When you choose a database you have to be mindful of how many use cases it will serve.

Dinesh identified eight different types of databases pointing out that each type had several subcomponents: Relational, NoSQL, NewSQL, Graph, Time Series, Document Stores, Search Engines, and In Memory. He drilled down on Relational and NoSQL. According to Dinesh, use Relational when data is relational, you are doing joins, transactions, SQL is well known, and the dataset fits.

He noted SQL down in Google Trends but this may be a function of familiarity relative to all of the new databases coming out. Relational is down per Database Engines. Graph growing rapidly versus relational as companies want to store graph data like social graphs (Facebook, Twitter) to use for inference of relationships. Time series and wide column databases are up versus relational. Graph databases are growing faster than all others.

Apache Cassandra has become very popular as companies want to manage massive amounts of data fast without losing sleep. Netflix, Uber, Instagram, Reddit, are SoundCloud are just some of the 1,500+ companies using Cassandra at a large scale.

So how do you define massive scale? 75,000+ nodes, 10+ PBs of data, over one trillion requests per day. Cassandra is durable handling massive scale day after day.

The CAP Theorem of availability, consistency, and partition tolerance, says you can’t have all three. If you want availability and consistency you go with MySQL. If you want consistency and partition tolerance you go with Apache HBase. And, if you want availability and partition tolerance you'd choose Cassandra.

Dinesh suggested you keep an eye out for Cassandra 4.0 which will change the CAP Theorem with greater reliability and stability, Checksummed transport, Checksummed storage, scalability, zero-copy data scaling, no internode messaging, 40 percent lower latencies = 40 percent lower, 10X increase in memory efficiency, and 4X throughput with scalable internode encryption.

database ,db news ,db research ,apache cassandra ,apache software foundation

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}