TechTalks With Tom Smith: Distributed SQL Databases for Cloud-Native Environments
Enterprise apps are moving from SQL and NoSQL to get the best of both worlds with distributed SQL.
Join the DZone community and get the full member experience.Join For Free
I had the opportunity to meet with Karthik Ranganathan, Founder & CTO, Yugabyte during the Distributed SQL Summit in San Jose. Earlier in the week, they announced the general availability of Yugabyte DB 2.0, the 100% open-source, high-performance distributed SQL database for global, internet-scale applications.
Updates include PostgreSQL syntax and wire-protocol compatibility, high-performance benchmarks, Jepsen-tested correctness, and Oracle-to-Yugabyte migration utilities. With Yugabyte DB’s SQL API (YSQL) ready for production, organizations are able to move away from monolithic SQL systems like Oracle to a distributed SQL database that is both open source and cloud-native.
You may also enjoy: Is SQL Beating NoSQL?
These updates mean there is an alternative for organizations looking for a distributed SQL solution with the look at feel of PostgreSQL that is high-performing and correct. Yugabyte’s SQL benchmarks showed throughput that was nearly twice that of AWS Aurora and almost 5x more than CockroachDB. Additional details on updates include:
New ecosystem integrations include GraphQL, Rook and a variety of database administration tools
PostgreSQL-compatible features include support for both simple and complex data types, foreign keys, JOINs, distributed transactions with serializable and snapshot isolation levels, plus advanced functionality, like stored procedures and triggers
Quickstarts and drivers for the most popular programming languages, including Java, Go, Python, and C++.
For developers looking to build event-driven systems using Apache Kafka, Yugabyte is also currently beta testing change data capture (CDC) and two-region multi-master and master-slave cluster capabilities.
What are your goals for the first Distributed SQL Summit?
We are seeing a trend where people are moving from SQL and NoSQL to get the best of both worlds with distributed SQL. The purpose of the summit is to help people learn from others, as well as see benefits others are realizing and problems they are solving with distributed SQL databases. Most of the people attending the summit are figuring out how to get started, how to scale horizontally and to learn how distributed SQL applies to their particular use cases.
What are the problems Distributed SQL solves?
As microservices move to the cloud with data, apps, and queries increasing exponentially, people want to use the SQL language because it’s feature-rich and familiar. However, they also need their databases to scale horizontally. When they need additional capacity, they can just add a node that’s fault-tolerant and highly available so when a node dies it doesn’t impact users or service. Additionally, there’s a need for geographically distributed data for regulatory purposes and to move the data closer to users for improved user experience (UX). As response time improves user acquisition and retention improves.
People have been moving non-critical data to NoSQL while keeping their critical data in SQL. However, those who want to scale out their transactional data are at a crossroads and are stuck because neither SQL nor NoSQL meets all of their needs. The solution is to move to a distributed SQL database.
A simplified example is an online retailer that has the key components of product inventory, a shopping cart, and order. They may have started with the catalog on a NoSQL database, but kept their source of truth in a SQL database. Orders are always in SQL because they are transactional and the company needs a lot of SQL features. The shopping cart is where the struggle happens.
As orders scale with more users, more purchases, made more often, the database needs to scale quickly. They need somewhere to put critical data without compromising functionality. This is where distributed SQL comes in.
During checkout, real-time transactions need to be tracked between the inventory and the shopping cart. There needs to be high availability and resilience since any failure could prevent the order from being completed — if a failure prevents checkout from being completed, you’re losing business. If someone clicks to checkout and it begins spinning, they will not be happy and will go somewhere else. Data needs to be geographically distributed to ensure transactions are executed quickly.
Additionally, your cart and your orders now contain personal data thus making geographic distribution of data important to meet regulatory requirements. It’s also important to think about the ability to work in a multi-cloud environment since many retailers do not want to pay AWS since they are competing with them, and the increased likelihood of an acquisition having a different cloud provider.
Distributed SQL databases solve these issues by providing transaction consistency with geographic data distribution.
What are the challenges your clients are trying to address?
- Horizontal scalability
- High availability and fault tolerance
- Cloud-native deployment of databases
- Geographic distribution of data
They want all of this with high performance to provide the best UX and CX.
How hard is it to make the transition?
The typical journey we see is a company with complex monolithic applications. As they move to the cloud, the app is being broken into microservices. Most companies identify the portions of the monolithic app that are causing the most pain and start to off-load those from the monolithic app. They then follow the strangle approach and evolve the entire monolithic app to microservices over time.
How do you see the evolution of distributed SQL taking place?
RDBMS single-node databases are very popular and are the default. In the future, all of these applications will get deployed in cloud-native environments. Single-node databases are not meant for cloud-native. They do not scale and provide no failover. As distributed SQL goes mainstream, it becomes the default for cloud-native deployments since they are able to scale quickly to meet the demand of horizontally scalable databases.
What do developers need to know about distributed SQL?
We’ve taken the most popular database, PostgreSQL with all the features and made it cloud-native, scalable, and fault-tolerant. We took the same PostgreSQL codebase and put it on top of Yugabyte to leverage the codebase and make it a scalable, distributed database. It’s perfect for the cloud and 100% open source
As data explosion happens you will have many databases. Think about how to manage all these databases in geographically distributed regions.
Opinions expressed by DZone contributors are their own.