Big Database (Part 2): Database Architecture
Big Database (Part 2): Database Architecture
Get a description of some of the most common architectures used to scale big data sizes: shared memory, shared disk, and shared nothing.
Join the DZone community and get the full member experience.Join For Free
Databases are better when they can run themselves. CockroachDB is a SQL database that automates scaling and recovery. Check it out here.
In my first post, I gave a short summary of the meaning of . In this article, I’ll describe the most common architectures used to scale to big data sizes.
Database Hardware Architectures
Over the past 30 years there have been three primary architectures to deploy a database, including:
- Shared memory: Effectively means shared everything — database running on a single machine with multiple CPUs sharing memory, and access to disks. Contention across all three causes scalability constraints. The majority of small-to-medium sized databases use this architecture.
- Shared disk: Uses a number of closely clustered machines to provide a single database service, but accessing a single shared copy of the data. (eg. Oracle RAC).
- Shared nothing: Also known as MPP (Massively Parallel Processing) solutions typically employed by large Data Warehouse systems. These cluster machines, but work largely independently, with data spread across the available resources. (eg. Vertica or Greenplum).
Each option has different features and constraints, and (depending upon the database solution) different scalability options.
Shared Memory Architecture
The diagram below illustrates the key hardware components of the simplest solution: shared memory.
In this solution, users connect to a single database server, typically with locally attached disks. This works well for many databases and is the traditional (and sensible) option for most applications. Scalability options are, however, limited to scaling up by adding CPUs, memory, or disks.
The main benefits and drawbacks of a shared memory solution are:
- Simplicity: This is a tried and tested, battle-hardened architecture that’s been proven to work well for over 30 years (I started with Oracle version 5 in 1986). It requires no special architecting, works on anything from a laptop to a high-end Unix machine, and is equally at home in the cloud or the desktop.
- Scalability but limited:Given that modern servers can host multiple processors and gigabytes of memory and terabytes of disk, there’s plenty of scope for growth, although eventually with an upper limit.
- High availability (at a cost): Enterprise-class machines become increasingly expensive, as the hardware includes dual redundant disks, network, and power supplies. Likewise, for full HA, you’ll need to factor in an off-site hot-standby system.
For most requirements, this may very well be the simplest and cheapest solution. Increasingly, cloud deployment should be considered, even for small-to-medium sized systems, as this can help control capital costs and provide a managed and relatively painless upgrade path.
Shared Disk Architecture
The diagram below illustrates the shared disk architecture in which users are transparently routed (perhaps using load balancing) to a database server in a cluster of machines. As the name implies, the disk system is shared by all nodes in the cluster (and is, therefore, a potential bottleneck) and is typically provided by a separate NAS or SAN disk system.
Under the hood, this is a relatively complex solution, as nodes are closely networked to act as a single machine. This can create performance and scalability challenges when, for example, a data item is simultaneously changed on two different nodes and the data is repeatedly “pinged” between the two. This means that this deployment can require some application changes to sensibly partition the application on the servers. At it’s extreme, it’s possible, for example, to deploy a data warehouse using one node for ETL processing and the other dedicated to servicing user read queries.
The main benefits and drawbacks of a shared disk solution are:
- Upgrade path: It can provide an excellent upgrade path from a single node system provided it’s deployed sensibly.
- The risk of pinging:Can limit performance, and if users can execute updates from any node, the risk of lock contention across the cluster will increasingly limit overall scalability. For this reason, many deployments of Oracle RAC don't exceed three nodes.
- Cost of SAN disks:The solution relies upon a fast shared disk system which on a high end SAN solution can prove expensive.
- Inefficient caching:As each node can potentially access the same data, it’s likely the data will be held (duplicated), in each cache, making less efficient use of memory.
- High availability:Is not built in, and dual redundant networks, power supplies, RAID disks and a standby database can add considerably to the overall cost.
As an on-premises upgrade path from a single database server, this provides a useful roadmap for growth. It’s sensible however to limit cluster size to around three for most solutions, before considering re-platforming to a larger machine.
Shared Nothing Architecture
The diagram below illustrates the shared nothing architecture whereby data is spread (sharded) across a number of independently running machines that act as a single cluster. Unlike the shared disk solution, each node is responsible for only a part of the data. In effect, nothing is shared.
Unlike the shared disk solution, which relies upon hardware for high availability, this solution can automatically replicate data to other nodes providing a significant element of built in high availability. These principles are applied to a range of solutions including Hadoop HDFS, NoSQL databases like Cassandra, and MPP column data stores including Vertica and Greenplum.
Unlike the shared disk solution, which can be deployed without thought, an MPP database potentially needs careful attention to data sharding and co-location to maximize scalability and performance. If, for example, most queries are within a geographic region, it makes sense to shard data across the cluster by region instead of business area. Once deployed, applying a different sharding and data partitioning strategy is not an insignificant operation on a multi-terabyte system.
The main benefits and drawbacks of a Shared Nothing database are:-
- Need for sharding: This solution relies upon sensible data placement and sharding to work well. Some (i.e. key/value lookup solutions) have built-in automatic sharding; others need to be carefully deployed, and designed into the application.
- Cheaper hardware: An MPP solution can be deployed on relatively inexpensive hardware with directly attached storage. In addition, scalability costs are incremental — typically by adding additional nodes.
- High availability: If data is automatically replicated across nodes, then the solution can provide built-in (inexpensive) high availability, as queries executed on a failed node can be re-executed on others.
- Fast single-node reads: Queries served by a single node run fast and leave other nodes free to satisfy other requests, and unlike the shared disk solution, each node caches only the data it’s responsible for making efficient use of memory.
- Write scalability: As there’s no distributed lock manager, this can be a great solution for high throughput insert/update operations, (provided they only affect a single data shard). From a write perspective it’s linearly scalable as each node is only responsible for a sub-set of the data. Updates which affect multiple nodes however need a distributed two phase commit which can slow operations.
- Read scalability:While both reads and writes work well when all the required data is on a single node, read queries across several shards (perhaps the entire database) can set an absolute limit on scalability. Effectively, every node in the cluster will handle a given number of queries per second. A complex analytic query across every node in the cluster counts as one of those queries. There are ways to work around this (i.e. materialized views, summaries and data replication), but it’s a constraint to be aware of.
NoSQL Database Architecture
As I’ve described before, NoSQL databases can provide nearly unlimited linear scalability for high velocity workloads on a shared nothing architecture.
Cassandra, in particular, has been demonstrated to scale to nearly 300 nodes, processing over 200,000 writes per second during a Netflix benchmark test.
Of course, there are significant downsides.
NoSQL Scalability Limits
NoSQL databases address a very narrow use case. In many cases, single key lookup operations where the data can be guaranteed to be retrieved from a single node. If a Cassandra or Riak database is deployed to 50 nodes and workload doubles, simply adding 50 nodes will solve the problem — providing the same response time, but to double the user population.
This principle breaks however if we use secondary indexes that run across the sharding key. Access via a secondary index on MongoDB, Cassandra or Riak, and you’ll potentially execute the query against every node in the cluster. Run several of these concurrently, and you’ll quickly hit the scalability limits of your your entire system.
In summary, when queries run across multiple shared nothing nodes in the cluster, scalability will drop.
NoSQL and Eventual Consistency
If you’ve ever seen the above page while using LinkedIn, you’ve experienced the limits of eventual consistency used by NoSQL databases. In short, you’re not guaranteed to find the data (or the latest version of it) when you need it.
Relational databases work within ACID transactions, and once you press Save, the server only responds when the change durable and is guaranteed to be consistently available to everyone. NoSQL databases, on the other hand, tend to compromise on consistency in favor of speed and availability.
When data is written, the process can return control immediately when it’s written to a majority of replica copies or (slowest of all) when it’s written to all copies. This introduces application complexity, as you need to handle these unexpected (sometimes temporary) data quality issues. You also need to accept you cannot guarantee 100% data accuracy in the event of hardware or systems failure.
Of course, the bigger the cluster, the more hardware failures. For example, given a hard disk mean time between failure (MTBF) of 10-50 years on a cluster with 10,000 disks, you should expect an average of one failure per day.
NewSQL Database Technology
NewSQL databases attempt to provide the high availability, massive scalability, and millisecond latency of NoSQL databases but with the flexibility of a relational database and with ACID transactionality. For strict OLTP (short lived, fast transactions) with the potential for massive scale, these provide a compelling solution with potential throughput reaching millions of transactions per second.
Some, like NuoDB, CockroachDB, and Google Spanner, are designed for elastic cloud-based solutions and separate data processing from storage nodes. This gives potentially signficant benefits of being able to scale them independently.
Finally, VoltDB from the Michael Stonebraker stable (Ingres, Postgres, and Vertica) is a horizontally scalable, fully ACID-compliant database built for streaming applications. Designed to capture, process, and analyze sensor data in near-real-time (milliseconds), it provides strong transactional guarantees and even survived the remarkably challenging Jepsen test of distributed system safety in the face of hardware failure.
Historical Context and Conclusion
As little as ten years ago, the shared memory/disk/nothing architectures were the only serious options available from the big three vendors: Oracle, Microsoft, and IBM.
The massive explosion of data volumes handled by Google, Amazon, and LinkedIn lead to the development of NoSQL databases to provide fast key/value lookup operations to complement the entirely batch-oriented serial processing available in Hadoop.
The subsequent explosion of open-source development means that there are nearly 300 databases available, including key-value, document, graph, and text search engines with specialized solutions for every use case.
It’s unlikely that every database will survive, but it’s good to see the 30-year relational model completely reinvented with the NewSQL databases while being largely transparent to the developer. The relational database is not dead yet.
Opinions expressed by DZone contributors are their own.