In July 2015, Amazon delivered Aurora, the AWS-only Enterprise RDBMS they’d been claiming since November 2014 would have “5x Enterprise performance at 1/5 the price.” Amazon was making a big play in the MySQL market—they’d recognized that there is a lot of demand for scaling MySQL workloads, and that Enterprises would be willing to pay for it. MySQL has always been considered great for startups, i.e., small to medium-sized companies, but the received wisdom is that “true Enterprise deployments” inevitably needed to migrate to SQL Server or Oracle RDBMS. Instead, Amazon is firmly putting their foot down and saying, “there is definitely an Enterprise-level market for MySQL-compatible databases,” one which demands Enterprise-grade features and availability, and of course Enterprise-grade scale and performance.
Aurora: Enterprise Features for MySQL Workloads
Feature-wise, Aurora was designed for Enterprise-level HA, utilizing quorum writes and reads across 3 AWS Availability Zones, with a promised 4x 9s uptime. But it’s Aurora’s performance that has always been its biggest selling factor: “5x Enterprise performance” is the claim. However, once actual customers got their hands on Aurora, the reality is a bit more pedestrian: Aurora is still based on MySQL single-master/multiple-slave architecture. In other words, Aurora’s write performance is hard-limited to the largest single instance that can be deployed on AWS, which at this time is a 32-core (vCPU) 8XL. So what are the multi-AZ ‘quorum writes’ doing? Actually, the multiple AZs are designed for HA; the writes are not scaled out, but are instead redundant. Having additional AZs to write to doesn’t speed up Aurora, but in fact represents a performance hit in their quest for HA. Correspondingly, Aurora claws back performance by leveraging ‘durability by network’ when they write to each of those AZs, which in turn can have durability considerations in a multi-AZ outage, let alone a full region outage.
Performance-wise, Aurora’s single write-master is no slouch; it can support thousands of connections, which increases overall throughput. However, as the number of connections scale, so does latency. All the AWS benchmarks for Aurora, from that single write-master, result in high latency per transaction. For instance, their Sysbench 100% writes benchmark generated latency north of 160ms to get their 100k TPS result. But is raw performance at the expense of latency enough for Enterprise workloads?
Google Spanner: Enterprise-Ready Scale
Google’s newly-released Cloud Spanner offers scale-out right out of the box. Specifically, this means Spanner is capable of the following:
- Scales-out both writes and reads without any application changes
- Continues to grow performance of both writes and reads by simply adding additional servers
- Constantly maintains transactional ACID guarantees (especially consistency and durability) across all the nodes in the database
Thus in a very real way, Google Spanner actually delivers on a lot of the “Enterprise performance” promise of Aurora.
Spanner’s Compatibility Problem
Although Spanner has the ability to scale out performance, it has no built-in compatibility with current applications. This is significant. Spanner does not use any standard JDBC or ODBC driver; it uses its own client libraries and the syntax it uses is a variant of SQL that is customized for Google Spanner. In other words, in order to use Spanner, some level of re-architecture and/or replatforming is needed for your applications to work correctly. Comparatively, Aurora’s native compatibility with MySQL allows applications to easily migrate from MySQL with a minimum of changes. Thus, applications wanting to leverage Spanner will have to be heavily rewritten, or written from scratch—similar to the replatforming necessary when migrating to Oracle or SQL Server.
Avoiding that replatforming cost is very important to decision makers choosing between augmenting their current MySQL systems, or taking the plunge of replatforming to a “bigger database.” So the question is whether the replatforming cost for Spanner is significantly less than the cost of replatforming to SQL Server or Oracle.
Do Enterprises Want Spanner or Aurora?
The market AWS is targeting with Aurora exists. No longer is the expectation “we’ll migrate off MySQL when we get bigger.” For one, migrating to the “bigger database” is a huge outlay of replatforming effort, requiring thorough code rewrites and often application re-architecture. But let’s not forget cost—both SQL Server and Oracle cost significantly more than either Aurora or Spanner. So if ‘Enterprise-ready” databases like Aurora or Spanner can provide sufficient Enterprise features, especially performance at scale, all the numbers are on their side when it comes to managing DevOps and IT budgets.
However, performance at scale is very important to the Enterprise market as well.
Aurora’s Scale Problem
Aurora can pick up any RDS MySQL deployments—that’s a natural progression. Compatibility is the win. However, similar to other MySQL DBaaS offerings (Azure SQL, Cloud SQL, RDS MySQL, etc) there is still a “hard stop” when it comes to write scale: each of these MySQL-based databases are limited to a single write master.
This problem cannot be overemphasized. Aurora’s inability to scale out writes is often surprising news to people trying Aurora for the first time—the expectation has been set for “high performance and capacity,” but after the capacity of Aurora’s single write master is exceeded, any additional scale requires application changes:
- Read fan-out to leverage read slaves. (“Read Replicas” on Amazon—more of a “read compute node” rather than having a full local copy of the DB. Replicas still have latency due to needing to wait for write locks to be released.) Read Replicas require different endpoints, as well as the ability of the application to handle delayed consistency.
- Write scale-out requires sharding, which Amazon Aurora does not support out of the box. This requires significant application changes to create and maintain consistency, as well as ongoing data management to ensure even distribution and avoid hotspots.
And all of this represents significant cost outlays as well.