Originally Written by Mat Keep
For the better part of a generation, the database landscape had changed very little. No one could say “this is not your father’s database.” They had become, in a word, boring.
Then a combination of factors catalyzed an era of innovation in database technologies: cheap storage and compute resources; pervasive connectivity; social networks; smartphones; the proliferation of sensors; open source software. Data volumes grew (and are growing) at exponential rates. Over 80% of today’s data no longer fits neatly into the normalized row and column table formats of the past. And so developers began engineering solutions to a new set of problems with a very different set of resources and assumptions. Today these new options include a variety of database architectures built around diverse data models – from key-value to document to wide-column and graph. And of course you still have the option of the venerable relational database.
For the enterprise these new technologies hold great promise. They open the door to new applications that could not be imagined before, or to more efficiently solve existing problems. They attract new technical talent. They facilitate the migration of systems to more cost effective infrastructure based on commodity hardware and cloud platforms. But at the same time, evaluation of these new options requires careful consideration.
Selecting the appropriate database for a new project requires evaluation against multiple criteria, including:
- Development considerations: includes the data model, query functionality, available drivers, data consistency. These factors dictate the functionality of your application, and how quickly you can build it.
- Operational considerations: performance and scalability, high availability, data center awareness, security, management and backups. Over the application’s lifetime, operational costs will contribute a significant percentage to the project’s Total Cost of Ownership (TCO), and so these factors constitute your ability to meet SLAs while minimizing administrative overhead.
- Commercial considerations: licensing, pricing and support. You need to know that the database you choose is available in a way that is aligned with how you do business.
Each these considerations need to be evaluated in context of specific application requirements as well as internal technology standards, skills availability and integration with your existing enterprise architecture.
So, where to start? The Database Selection Matrix is designed to serve as a decision framework by teams responsible for database selection. It has been developed in collaboration with several large enterprises who have the choice of running multiple databases in production, and who wanted to institute a systematic methodology for database evaluation. Responses to questions in the matrix helped them identify key requirements and guide selection. And it can do the same for you.
Lets illustrate how the Database Selection Matrix can be used by working through a practical example.
The Database Selection Matrix in Action!
ACME Retail Corporation runs a large vehicle fleet to distribute produce to its nationwide network of stores. The CEO is intent on improving distribution efficiency and so tasks her enterprise architects to build a new platform that can utilize sensor data generated by the company’s trucks. By capturing and analyzing this data, the organization believes it can optimize route planning, improve delivery times, cut wastage and reduce business interruptions caused by breakdowns.
ACME Retail Corp is typical of many enterprises that see the opportunity to unlock new efficiencies by leveraging the “Internet of Things”. As Morgan Stanley stated in the “Internet of Things is Now” research “We do not believe traditional data storage architectures are well- suited to accommodate the volume, velocity, and variety of IoT data”. For this reason, enterprises are looking beyond traditional RDBMS technology to the swathe of new database options available to them. Bosch SI did exactly this when it took the decision to use MongoDB to power the Bosch IoT Suite.
Of course, MongoDB may not be a perfect fit for every IoT project. There are many choices available – as there for every new type of project – and the ACME architecture group needs a way to navigate the complex landscape of modern databases. Using the Database Selection Matrix, they have the framework to ask the key questions that will guide their technology decisions. So lets put it into practice.
In this opening phase, the architects need to evaluate how their shortlisted database options meet the functional requirements of the app that is being built. This is impacted directly by multiple factors – and these are the questions they will need to ask.
The Data Model:
- Will the application need to handle data of varying structure and types?
- How large can each data type be – is our data made up of simple integers, strings and timestamps or can it also be large binary files such as images or videos?
- Can our data just be represented as a set of opaque values, or does it need to be typed so other applications can make sense of it?
- Do we know the data structure will remain constant, or will it vary as we introduce new sensor data and as the business updates application requirements?
- Does the application require its data to be strongly consistent (i.e. read our own writes), or can eventually consistent data be tolerated (and do our developers know how to handle the complexity it introduces?). Do we end up trading performance and availability if we configure the database to only return the freshest data?
The Query Model:
- What sort of queries are we going to run against the database? Is it simple key-value lookups that we know in advance or do we need to execute ad-hoc queries and complex aggregations to support real-time analytics that the business wants to see?
- Can we run analytics directly against the database, or do we need to replicate data to dedicated search or analytics engines?
- Will the application be handling geospatial queries and text search?
- Does the data need to be integrated with our BI & analytics tools, and what about our new Hadoop cluster, or the data warehouse?
- Which languages will our engineers be using to develop the application, and does the database have drivers available for them?
In this second phase, the ACME Retail architects need to evaluate how each database would run in production. No-one wants to hand-feed a custom technology, so they need to understand if the database can meet the availability, scalability and security needs of the business, and interoperate with the existing management frameworks.
- What is the application’s availability SLA? What are our RTO and RPO objectives?
- Will our operations teams manage failure recovery, or is this something that should be fully automated by the database?
- What capabilities does the database offer to maintain availability during routine maintenance? Are there tools available to manage this or do we need to script something ourselves?
- Are there specific requirements to replicate data between our data centers to support disaster recovery?
- How do we expect this application to grow? Will the database need to scale beyond the limits of just a few servers?
- If data is to be distributed across multiple nodes, will it be partitioned in such a way that it is still optimized for the application’s query patterns?
- Do we need to scale this across data centers? Can we write and read data locally to reduce the effects of geographic latency?
- Can we scale storage capacity and I/O by compressing the data, and are different compression algorithms available to optimize compression ratio to CPU overhead?
- What types of data access control do we need? Can we just use authentication controls within the database or do we need to integrate with our existing LDAP infrastructure?
- What type of authorization controls are available, and how granular can we get? Do these controls needs to extend down to the level of individual attributes within a document?
- Is encryption needed, and will those pesky compliance officers need to audit every action taken against the database?
- How are going to run this thing?
- Does the database provide tools to automate provisioning and upgrades or do we need to create our own scripts?
- How about backups? Can we get incremental backups. How about point in time backups?
- And then monitoring. We need to know, for example, if disk utilization is peaking above 60% so we can take action before we hit a problem. Can we add these alerts into our existing operational workflow tools?
- Can we integrate the database’s management platform into our own operational tooling so we don’t need to leave our single screen?
Once the ACME architects have profiled their technology requirements, they will need to understand how the database is licensed and priced, before legal and procurement come knocking at the door:
- Licensing: what license is used, and is this acceptable to our legal team? Are commercial licenses available?
- Support: What support options are open to me? Can I get support SLAs from my vendor, even if I use a community version of their product?
- What is the SLA I can expect if I do hit an issue?
- What sort of training is available? Is my only option to send my engineers to public classes, or can we get trained on-demand, at our own pace?
The ACME example is designed to illustrate some of the key questions engineering teams need to ask. It is true that the database landscape is more complex than ever. But it needn’t be bewildering – the Database Selection Matrix is designed to help you identify and compare what is most critical as you build your next app, so go ahead and download it now.
Looking for additional information about database selection? Learn why organizations choose MongoDB to deliver applications and outcomes that were never previously possible. Download the white paper below: