The realities of modern corporate networks make the move to distributed database architectures inevitable. How do you leverage the stability and security of traditional relational database designs while making the transition to distributed environments? One key consideration is to ensure your cloud databases are scalable enough to deliver the technology's cost and performance benefits.
Your conventional relational DBMS works without a hitch (mostly), yet you're pressured to convert it to a distributed database that scales horizontally in the cloud. Why? Your customers and users not only expect new capabilities, they need them to do their jobs. Topping the list of requirements is scalability.
David Maitland points out in an October 7, 2014 article on Bobsguide.com that startups, in particular, have to be prepared to see the demands on their databases expand from hundreds of requests per day to millions — and back again — in a very short time. Non-relational databases have the flexibility to grow and contract almost instantaneously as traffic patterns fluctuate. The key is managing the transition to scalable architectures.
Availability Defines a Distributed Database
A truly distributed database is more than an RDBMS with one master and multiple slave nodes. One with multiple masters, or write nodes, definitely qualifies as distributed because it's all about availability: if one master fails, the system automatically rolls over to the next and the write is recorded. InformationWeek's Joe Masters Emison explains the distinction in a November 20 2013, article.
The evolution of database technology points to a "federated" database that is document and graph based, as well as globally queryable. Source: JeffSayre.com.
The CAP theorem states that you can have strict availability or strict consistency, but not both. It happens all the time: a system is instructed to write different information to the same record at the same time. You can either stop writing (no availability) or write two different records (no consistency). In the real world, everything falls between these two extremes: business processes favor high availability first and deal with inconsistencies later.
Kyle Kingsbury's Call Me Maybe project measured the ability of distributed databases such as NoSQL to handle multiple partitions in real-world conflict situations. InformationWeek's Joe Masters Emison describes the project in a September 5, 2013 article. The upshot is that distributed databases fail — as all databases sometimes do — but they do so less cleanly than single-node databases, so tracking and correcting the resulting data loss requires asking a new set of questions.
Securing distributed databases is also more complex, and not just because the data resides in multiple physical and virtual locations. As with most new technologies, the initial emphasis is on features rather than safety. Also, as the databases are used in production settings, unforeseen security concerns are more likely to be addressed as they arise. (The upside of this equation is that because the databases are more obscure, they present a smaller profile to the bad guys.)
The Advent of the Self-Aware App
Databases are now designed to monitor their connections, available bandwidth, and other environmental factors. When demand surges, such as during the holiday shopping season, the database automatically puts more cloud servers online to handle the increased demand and similarly puts them offline when demand returns to normal.
This on-demand flexibility relies on the cloud service's APIs, whether they use proprietary API calls or open-source technology such as OpenStack. Today's container-based architectures, such as Docker, encapsulate all resources required to run the app, including frameworks and libraries.