Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

What’s Next After Dynamo and Cassandra?

DZone's Guide to

What’s Next After Dynamo and Cassandra?

Learn why the key to a successful database strategy is simplicity and how database flexibility actually adds complexity.

· Database Zone
Free Resource

Navigating today's database scaling options can be a nightmare. Explore the compromises involved in both traditional and new architectures.

Avinash Lakshman, CEO of Hedvig and developer of Dynamo and Cassandra, shares his thoughts on the current and future state of databases.

How are you and Hedvig involved in databases?

I’ve had the honor of developing and operating two large databases based on distributed computing principles. In 2004, I was part of a small team that invented Dynamo at Amazon, which is the industry’s first NoSQL platform and still powers the shopping cart experience on Amazon.com today. Then, in 2007, I moved to Facebook where Cassandra was my brainchild. Our goal was to create an extremely scalable and performant database to power a search capability across all of Facebook’s messages.

After successfully developing and operating these truly internet-scale systems, I decided to leave Facebook and found Hedvig, where I’m currently the CEO. The idea was to bring the scalability, performance, and resilience of NoSQL platforms to the storage layer. Rather than a database that benefits web apps, I felt I could build a distributed storage platform that benefited all enterprise apps. We launched v1 of the Hedvig Distributed Storage Platform in 2015. Now, we provide a shared storage system that powers many enterprise databases, including bare metal, virtualized, and containerized instances. For example, many customers looking to deploy Cassandra or MySQL or MongoDB in a container need a platform like Hedvig because containers were not designed for such stateful applications.

What are the keys to a successful database strategy?

I hate to sound like a consultant, but it depends. It depends on the application, the user requirements, and the budget. Those will dictate speed, scale, and security, which are definitely the right technical keys to a successful database strategy. But based on my experience developing Dynamo and Cassandra, I would add a fourth S: simplicity. At both Amazon and Facebook, I had the pleasure (and pain) of being operationally responsibility for the code I was deploying into production. It’s here that I learned the key to a successful database strategy is simplicity. Providing too much flexibility lends itself to too much complexity and large staffing and skill requirements. No production database can survive if an army of skilled DBAs is needed to overcome its complexity. Instead, anticipate the application needs. Codify that into the database. Automate complex procedures. Making it operationally simple is what the hyper-scalers mastered.

Are databases critical to business success?

Here the answer is a resounding yes. Digital transformation tops most of our customers’ business agenda. What that means to me is the ability to analyze and monetize data for new revenue streams. You simply cannot do this without a proper database strategy, especially for larger enterprises where the sheer scale of digital transformation requires storing, manipulating, and analyzing petabytes of data.

How have databases changed in the past year?

Customers tell us that Docker and containers have been the biggest change in the last year. Containers were originally the domain of cloud-native companies (think Netflix, Uber, and Airbnb). Now, mainstream developers and DevOps teams are deploying Docker and using containers in a more traditional enterprise architecture. But containers are ephemeral in nature and don’t handle database workloads well. There are quite a bit of planning and infrastructure considerations needed to make sure containerized databases are portable, persistent, and performant.

What are the technical solutions you, or your clients, use for your databases?

We don’t deal directly with databases. To us, it’s just a workload that sits atop shared infrastructure. What we do see is a big shift towards software-defined storage (SDS) as a technical underpinning to modern databases. Databases are almost exclusively scaled-out, distributed systems these days. Storing database values on traditional storage creates performance and economics mismatches. Single- or dual-controller scale-up arrays can’t handle the I/O load — and they cost millions of dollars. As data grows exponentially, the commodity and linear scalability of software-defined storage are needed. Even platforms like Cassandra that handle their own data storage can benefit from the data efficiency, resiliency, performance, and security of SDS.

What are real world problems you, or your clients, are solving with databases?

The real world problem I was always trying to solve was availability. As digital services become mission critical — and mainstream — a new architecture is needed to eliminate downtime. At Amazon, we had to build Dynamo to be multisite so it could survive a datacenter outage without causing downtime to Amazon.com, where downtime was directly correlated to losses of millions of dollars. Likewise, Cassandra was also built to be multisite to survive east or west coast datacenter outages at Facebook. Building resilient services is a higher priority than performance in today’s always-on, connected world. I’m more likely to abandon a site or stop using a service if it incurs frequent outages than if it’s a fraction of a second slower.

What are the most common issues you see companies having with databases?

There will always be performance, scale, resiliency, and security challenges. But increasingly, and at a tactical level, we see the shift to containers as a common issue. Moving from bare metal to containers is not your typical “lift and shift.” It’s a true modernization of the architecture. Understanding data persistence, portability, and performance in a containerized environment is truly different. Building the right microservices architecture and a scalable, distributed server, storage, and network fabric is key to run stateful, database applications in containers.

Where do you think the biggest opportunities are in the evolution of databases?

It’s been a while since I was at the forefront of the database evolution. I’m not sure I can speak to it at a technical level. I do think the application of databases will evolve. Artificial intelligence, machine learning, predictive analytics, and quantum computing will all radically shape database needs.

What skills do developers need to be proficient with databases?

For me, it’s skill sets. Databases are not just shrink-wrapped software that’s deployed and then looked after by a DBA. Databases are now part of the DevOps fabric. It takes a different skillset to get business value from them these days. You have to be part programmer, part automator, part infrastructure admin, and part database admin. That’s a unicorn these days.

Is there anything you’d like to know about what developers are or are not doing with regards to database projects they are working on?

I’d love to further understand if developers are comfortable with the changes in programmable infrastructure. Are they comfortable (and capable) of provisioning compute, storage, and networking as part of the stack needed to support databases, especially containerized ones?

What have I failed to ask that you think we need to cover?

One of the interesting trends we’ve seen lately is predictive analytics. We see customers deploying Cassandra or other NoSQL databases to act as repositories for sensor data, machine data, application telemetry, user telemetry, etc. It’s not quite “big data,” although there can be a big data component for processing the data. The primary goal is to look for trends and pre-empt action based on what’s observed. We hear a lot about artificial intelligence and machine learning, but this is honestly further out for most enterprises. Think of predictive analytics as crawling; artificial intelligence and machine learning are the walking and running stages.

Understand your options for deploying a database across multiple data centers - without the headache.

Topics:
databases ,dynamodb ,cassandra

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}