Benchmarking Google Cloud Spanner, CockroachDB, and NuoDB
Take a deep dive into three elastic SQL databases, all of which are pretty promising options that meet the needs of the modern data center.
Join the DZone community and get the full member experience.Join For Free
As a NuoDB solution architect, I constantly talk to people about what they're looking for in a database. More and more often, architects and CTOs tell us that they're building their next-generation data center — often using containers or cloud infrastructure — and they need a database that fits this model.
In their ideal world, they want a familiar relational database — but they want one that can deliver elastic capacity, maintain consistent transactions across multiple data centers, and span multiple public clouds at once. They're asking for a database of the future.
This is typically why they're talking to us at NuoDB. As a distributed SQL database that can scale in and out by just adding and deleting nodes, NuoDB can run across multiple deployment environments and data centers, while still maintaining strict transactional consistency and supporting ANSI SQL. We think this makes it an ideal database for modern deployment needs.
A New Kind of Cloud Database
But actually, we aren't the only ones thinking along these lines. There's a whole new category of database — the elastic SQL database — that has emerged. This type of database gives us a preview today into what databases will look like in five years. In addition to NuoDB, there is:
- Google Cloud Spanner: Unlike Google's other cloud SQL databases (specifically Google Cloud SQL), Cloud Spanner is built as a distributed, scale-out SQL database. It has been used for many years powering "AdWords" applications — the revenue engine for Google.
- CockroachDB: Developed by Cockroach Labs as the "database that survives," CockroachDB is basically an open-source adaptation of Spanner. It shipped the first production release of the database earlier this year.
I am an admirer of all of these technologies as they represent a new way of thinking about databases. Yet, after exploring multiple papers and presentations, I still felt unfulfilled. All these products look promising on paper, but how do they really perform?
To answer this question, a couple of my colleagues and I decided to take these products out for a spin and compare them directly. Not a formal pedantic evaluation, but just a wet finger in the air.
YCSB Evaluation: Environments and Configuration
For simplicity, we decided to use the YCSB benchmark. Brian Cooper, the author of YCSB, joined Google and wrote YCSB driver for Spanner. CockroachDB and NuoDB both support JDBC driver allowing for out-of-the-box support of YCSB. So, the task of spinning up the YCSB tests is straightforward.
The test configurations were configured similarly. A single multi-threaded YCSB application connects to up to three database servers running on separate hosts.
The beauty of an elastic SQL database is that the application does not care how many servers make up the database or where the servers physically reside. The application always operates against a single consistent logical database.
For our exercise, we ran the YCSB Spanner tests on Google Cloud for Cloud Spanner, while benchmarks for CockroachDB and NuoDB resided on bare metal in our lab using 32GB, four-core SuperMicros with the 10G network.
If you aren't familiar with YCSB tests, they consist of a number of workloads named from A to E. In short the workloads can be described as follows:
A: Heavy update (50% read, 50% update)
B: Mostly read (95% read, 5% update)
C: Read-only (100% read)
D: Read the latest inserted (90% read, 10% insert)
E: Scan the latest inserted (90% read, 10% insert)
F: Read-modify-write (50% read, 50% update)
Full descriptions of YCSB workloads can be read here.
YCSB Results for Google Cloud Spanner, CockroachDB, and NuoDB
For each of the workloads, we varied the application load (number of threads ran by YCSB application) and the database capacity (number of nodes between one and three). We executed multiple runs, then took the best throughput numbers across all runs for each database and plotted them in the chart below:
This graph measures the number of transactions per second for each database during peak throughput. In all five of the tested workloads, NuoDB significantly outperforms both Cloud Spanner and CockroachDB.
You'll notice that we did not complete Workload E, as it would require a minor change to the YCSB application in order to run correctly with NuoDB. To preserve the integrity of the test, we wanted to run YCSB without any changes, so excluded Workload E from our testing. For those interested in running YCSB Workload E on NuoDB, you can comment below for details on what you would need to change.
As you can see from the graph, NuoDB outperforms other elastic SQL databases with much higher transactional throughput across all workloads. However, the most striking difference is with Workload C read-only tests. This behavior is expected and due to NuoDB's memory-centric architecture.
In addition to observing throughput results, we also wanted to understand latency: a critical concern with a distributed database. As you can see in the chart below, this is another area where NuoDB's memory-centric architecture delivers benefits — in this case, in the form of low-latency data access. NuoDB's latency numbers are significantly lower than latency for Spanner and CockroachDB.
This graph measures average latency experienced by the application for READs during periods of peak throughput. Minimal latency is ideal for the best user experience.
To be fair, some of Spanner's sluggishness can be attributed to network latency of the Google Cloud when compared with native LAN speeds. We chose to run both Cockroach and NuoDB on our hardware so that we could more easily understand any anomalies between test runs — something that would have been much more difficult to do under cover of cloud.
YCSB also collects "update" latency numbers that are captured below. The results are much more aligned for all databases as all of them are gated by I/O performance.
This graph measures average latency experienced by the application for
INSERTs during periods of peak throughput. Minimal latency is ideal for the best user experience. Note that Workload C is not represented here as it is a read-only workload.
Noticeably, the insert latency (Workload D) for Google Spanner is very low. Our tests ran repeatedly exhibiting the same kind of behavior. It is not clear whether this is just a testing anomaly or simply that Spanner is well-optimized for local insert latencies.
Summary: Three Options for the Modern Data Center
In summary, our hands-on experiment gave us a pretty good sense for performance ranges. We also made a few general observations:
- Google Cloud Spanner is extremely easy to manage. There is only one configuration parameter: number and location of Spanner nodes. The rest of the heavy lifting is done behind the curtain. This approach sets a new "high-bar" for management ease-of-use.
- CockroachDB is very easy to set up and start using. It is packaged as a single executable that you drop on a host and pass a few configuration parameters. And it is an open-source distribution for those committed to an open-source-based infrastructure.
- NuoDB is flexible and fast. It can achieve superior performance for throughput and volume. But it requires awareness of architecture and best practices to do so.
While we tried to make the benchmark tests as fair as possible, we admit that as experts in NuoDB, our knowledge has probably biased the results at least a little. For instance, we ran a configuration that enabled the entire data set to fit within memory — but we ran the same configuration for CockroachDB, as well. That said, we don't think the drastic differences can be explained by that alone.
NuoDB has been generally available since January 2013, so we've spent years of hard work improving our product to perform in hard-hitting, real-world customer experiences. We're excited to welcome these new kids on the block and see what they bring to the table. I think we have a lot we can learn from each other.
And in general, I think all three of these "elastic SQL" databases are pretty promising options that meet the needs of the modern data center. What do you think? Would you take these out for a spin?
Published at DZone with permission of Boris Bulanov. See the original article here.
Opinions expressed by DZone contributors are their own.