At this point, everyone has heard of Google’s Cloud Spanner. It’s Google’s first database offering that can scale out write transactions linearly, as well as provide cross-node transactions with ACID guarantees, claiming to solve significant problems that Amazon’s Aurora cannot.
But as discussed before, Spanner is able to accomplish that scale by accepting some trade-offs, such as increasing latency with each node (“participant”). And, of course, the elephant in the room: Google Spanner is not MySQL compliant, nor even ANSI SQL compliant.
It turns out those aren’t the only compromises Google has made with Cloud Spanner. Let’s walk through a complete list, and then highlight what each means to your workload (and business rules).
After spending the better part of a week at Percona Live 2017, I answered the following questions enough times that it’s probably a good idea to write it all down.
Spanner isn’t MySQL compliant, nor even ANSI SQL compliant, and doesn’t plan to be.
We discussed this before, but it’s still worth noting. The biggest interest and use case for Google Spanner’s target customers is migrating workloads from MySQL. But at the Google Next Deep Dive I attended, there was quite a bit of discontent in the crowd at the lack of MySQL support, both currently and in the Spanner roadmap. Migrating a workload from MySQL requires a lot of effort, both in design and coding, in order to ensure a performant implementation. And as detailed below, making mistakes in schema design can lead to system downtimes, due to the lack of online schema changes for PK and interleave object mutation.
Cloud Spanner doesn’t provide full RDBMS functionalities, such as referential integrity, except as specified at ‘design time.’
When designing your data model, Spanner requires that parent/child relationships be defined at the table level. The documentation does say elsewhere that this is optional:
You can optionally define parent-child relationships between tables if you want Cloud Spanner to physically co-locate their rows for efficient retrieval — Cloud Spanner Schema and Data Model
However, if you do not define child tables as “interleaved,” then no referential integrity (cascade updates and deletes) will be automatically performed. If you later realize that you’d like to change a table into a child (“interleaved”) table, you’re going to have to drop/recreate that table.
And yes, you can still create roll-your-own JOINs across non-interleaved tables (at lesser performance) using indexes. But those fields aren’t true PK/FK relationships, so cardinality guarantees and cascade updates and deletes won’t be enforced, and thus offloading that usual RDBMS-native activity to your application programmers.
Cloud Spanner doesn’t provide online schema changes for keys.
The keys of a table can’t change; you can’t add a key column to an existing table or remove a key column from an existing table. — Cloud Spanner Schema and Data Model
This means if you want to change the PK of your table, you’re going to have to drop and replace that table. Which means you have to pause your workload, i.e. take a downtime. And why would you want to change the PK of your table? If you’ve ‘classically’ designed your table(s)’ PKs to leverage timestamps, auto-increment IDs, or use concatenated (rather than the Google-recommended hash) keys, then when business rules change, this can come up. The Spanner documentation highlights this in bold, all-caps, and red.
Anti-pattern: Using a column whose value monotonically increases or decreases as the first key part. — Cloud Spanner Schema Design
Similarly, if your workload leverages ad-hoc query generation, i.e. with dynamic reports with user-choosable attributes, any table relationships that aren’t already instantiated as parent/interleaved tables, aren’t guaranteed to have correct cardinalities unless that’s been (re)built-in by your application developers.
Even more likely, if your workload experiences unexpected data growth, the original PK you designated might result in lowered performance in the system. At which point Google recommends swapping the order of your concatenated PK, or using a hash-distributed key, either of which creates downtime.
Many Cloud Spanner schema changes require validation, which is not controlled by MVCC.
If you’re adding NOT NULL to a column, Cloud Spanner will almost immediately begin rejecting writes for new requests that use NULL for the column. If the new schema change ultimately fails for data validation, there may be a period of time when writes were blocked even if they would have been accepted by the old schema. — Cloud Spanner Schema and Data Model (emphasis added)
Cloud Spanner is bad at choosing indexes and recommends using
FORCE_INDEX directives in your code.
Translation: not only does your application SQL updates have to be converted from SQL to Google’s APIs, but your application programmers’ efforts will have to be bolstered by data architects who know how to both design around the limits of Spanner, including avoiding Spanner-specific anti-patterns which degrade performance. These include changing PKs to hashes, removing auto-increment, and correctly designating parent/child data relationships as “interleaved” tables and interleaving necessary indexes as well at design-time.
All of this information is idiomatic to Spanner, which means your code conversion efforts will need to take that much longer before you can see the performance benefits of Spanner
Data migration to Cloud Spanner is tricky.
Not only are the lists of supported datatypes severely reduced from MySQL 5.7, but actually getting the data into Spanner requires carefully designed and scaled-back (read “slow”) ingest:
Avoid writing rows in primary key order. …Ideally each data load worker would be writing to a different part of the key space.
Partition the ordered key space into ranges, and then have each range processed by a worker batching the data into that range into approximately 1MB or 10MB chunks. — Google Spanner Schema Design
Note that Quizlet found they had the least amount of errors when they limited themselves to 5MB per insert transaction and that the data ingest took a “longer than they expected.”
Current Cloud Spanner doesn’t support consistent backups controlled by MVCC or replication across regions today.
To be fair, both of these are on the Spanner roadmap. Cross-region replication is where both the vaunted high-speed Google back-end network custom hardware and private fiber will shine, as will as all those atomic clocks for consistency. It’s just you aren’t able to leverage those yet, so be sure to factor that in when planning an implementation.
Cloud Spanner TCO is larger than users expected.
And a less ‘technical’ consideration is price. Usually, as DevOps, DBAs, and data architects, we don’t usually consider the price of the implementation; we’re working with product features and technical feasibilities. However, at some point the ‘business side’ of your company has to authorize a purchase of the actual system(s); whether it’s licensing, support, or DBaaS solutions, all of them have a price, and all of them have to be ‘justified.’ Knowing how to frame your implementation recommendations into pros- or cons-based ‘business cases’ will help you greatly, either with your department head or the guys and gals in finance.
That being said, Cloud Spanner costs more than you’d think. At Google Next, I talked to a few Beta customers and they were a bit taken aback by how much it costs, i.e. $10-$40k for a test at full scale for their workload, and spiking sharply for any accesses outside of Google Cloud Platform. The Quizlet presentation at Google Next was terrific; it was implied the Quizlet team got a ‘good price’ from Google.
On some level, Google Cloud Spanner harkens back to the time of hierarchical databases, which required the schema to be set at design time. If you know the queries you’re going to run, and how big your data is going to be, deploying Cloud Spanner is reasonably straightforward. You’re still required to change your code and as well as be responsible for making sure it runs well on their proprietary system (i.e. you’re still locked-in). Once you’re done with that work, your workload should perform much better than on MySQL or Aurora.
However, if your workload and business rules require ongoing flexibility (and who doesn’t in this fast-moving cloud-scale world), you might want to consider an RDBMS that:
- Is like Spanner:
- Linearly scales-out writes and read transactions.
- Maintains full ACID guarantees across all the nodes in the cluster.
- Provides more than Spanner:
- Provides full RDBMS functionalities such as referential integrity.
- Provides online schema changes (i.e., no dump/replacing tables to change PK/FK relationships).
- Automatically uses available INDEXES.
- Provides consistent backups controlled by MVCC today.
- Provides full ANSI SQL access to DML and DDL.
- Replicates across regions today.