The State of Databases 2019
The State of Databases 2019
Cassandra 4.0 will debunk the CAP theorem.
Join the DZone community and get the full member experience.Join For Free
I had the opportunity to hear Dinesh Joshi, Senior Software Engineer and Architect at Apple and an active member of the Apache Software Foundation share his thoughts on The State of Databases 2019 while attending Percona Live in Austin, Texas.
Data is growing and will continue to do so. In 2019, humans will generate 40 zettabytes of data. Data continues to grow in importance from a business standpoint. Things like flight systems data for airlines and electronic medical records for hospitals need to be backed up, protected, and available on a moments notice.
The data growth fuel is being fueled by embedded devices, Fitbits, watches, phones, IoT devices, sensors, wearables, and more. All these are generating tons of time series data. And while businesses want to analyze everything they can't and they're having trouble deciding what data to use.
Consequently, the database landscape in 2019 is sprawling. Dinesh stopped counting at 380 databases. Operators and developers have a tense relationship. Operators deploy and manage multiple databases while developers just want to use what fits their use case. Each wants different features and while there is some overlap, the two groups are rarely aligned.
Cascading costs can become unmanageable. Developers do not typically consider the cost of the database. They typically start with a database, then add the access layer, services (Rest, GRPC), and UI/presentation. If you need to change the layer on top it’s typically not too expensive or time-consuming. When you change the database the cost cascades. When you choose a database you have to be mindful of how many use cases it will serve.
Dinesh identified eight different types of databases pointing out that each type had several subcomponents: Relational, NoSQL, NewSQL, Graph, Time Series, Document Stores, Search Engines, and In Memory. He drilled down on Relational and NoSQL. According to Dinesh, use Relational when data is relational, you are doing joins, transactions, SQL is well known, and the dataset fits.
Apache Cassandra has become very popular as companies want to manage massive amounts of data fast without losing sleep. Netflix, Uber, Instagram, Reddit, are SoundCloud are just some of the 1,500+ companies using Cassandra at a large scale.
So how do you define massive scale? 75,000+ nodes, 10+ PBs of data, over one trillion requests per day. Cassandra is durable handling massive scale day after day.
The CAP Theorem of availability, consistency, and partition tolerance, says you can’t have all three. If you want availability and consistency you go with MySQL. If you want consistency and partition tolerance you go with Apache HBase. And, if you want availability and partition tolerance you'd choose Cassandra.
Dinesh suggested you keep an eye out for Cassandra 4.0 which will change the CAP Theorem with greater reliability and stability, Checksummed transport, Checksummed storage, scalability, zero-copy data scaling, no internode messaging, 40 percent lower latencies = 40 percent lower, 10X increase in memory efficiency, and 4X throughput with scalable internode encryption.
Opinions expressed by DZone contributors are their own.