Curator's Note: The post below was originally published in September, 2012.
Today I read this recent paper by Google:
Spanner: Google's Globally-Distributed Database
As a globally-distributed database, Spanner provides several interesting features.
- First, the replication conﬁgurations for data can be dynamically controlled at a ﬁne grain by applications. Applications
can specify constraints to control which datacenters contain which
data, how far data is from its users (to control read latency), how far
replicas are from each other (to control write latency), and how many
replicas are maintained (to control durability, availability, and read
- Second, Spanner has two features that are difﬁcult to implement in a
distributed database: it provides externally consistent  reads and
writes, and globally-consistent reads across the database at a
time-stamp. These features enable Spanner to support consistent
backups, consistent MapReduce executions , and atomic schema
updates, all at global scale, and even in the presence of ongoing
The very interesting part is that it provides semi-relational tables
(which some other NoSQL systems also do), SQL-like query language, and
synchronous replications – you can see that many applications within
Google use non-optimal data stores just to have the synchronous
Spanner exposes the following set of data features to applications: a
data model based on schematized semi-relational tables, a query
language, and general- purpose transactions. The move towards supporting
these features was driven by many factors.
- The need to support schematized semi-relational tables and
synchronous replication is supported by the popularity of Megastore .
At least 300 applications within Google use Megastore (despite its
relatively low per- formance) because its data model is simpler to
manage than Bigtable’s, and because of its support for synchronous
replication across datacenters. (Bigtable only supports eventually-consistent replication across data- centers.)
- The need to support a SQL-like query language in Spanner was also
clear, given the popularity of Dremel  as an interactive data-
- Finally, the lack of cross-row transactions in Bigtable led to
frequent complaints; Percolator  was in part built to address this
failing. Some authors have claimed that general two-phase commit is
too expensive to support, because of the performance or availability
problems that it brings [9, 10, 19]. We believe it is better to have
application programmers deal with performance problems due to overuse of
transactions as bottlenecks arise, rather than always coding around the
lack of transactions. Running two-phase commit over Paxos mitigates the
Essentially, they accomplish by still having several versions of each
record (like BigTable, but with timestamps assigned by the server) and
guaranteeing timestamps to be within an error margin by using multiple
time sources like GPS and atomic clocks. Then there are Paxos groups, of
course, and two-phase commits for the transactions.
I found this paper very interesting as I've been thinking of some
challenges of implementing these features (like across DC replication
and backup) on NoSQL data stores. Quite an interesting paper and
accomplishment by this Google team.