Perhaps you're building a brand new application. Or maybe your current database isn't working well. Choosing the right database for your application can be overwhelming, given all the choices available today.
Having used a variety of database vendors in production, I can easily say that there is no one right answer. So, to help you along with the decision-making process, I'm going to give you three tips for selecting the right database for your application:
Tip #1: It Isn't a SQL vs. NoSQL Decision
There are countless articles on the pros and cons of SQL and NoSQL databases. While they may provide some insight into the differences, they miss many of the important factors of the decision-making process. Most importantly, we need to select the database that supports the right structure, size, and/or speed to meet the needs of our application.
Structure focuses on how you need to store and retrieve your data. Our applications deal with data in a variety of formats, so selecting the right database includes picking the right data structures for storing and retrieving data. If you select the right data structures for persisting your data, your application will require more development effort to work around these issues and may not scale as a result.
Size is about the quantity of data you need to store and retrieve critical application data. The amount of data you can store and retrieve before the database is negatively impacted may vary based on a combination of the data structure selected, the database's ability to partition data across multiple filesystems and servers, and vendor-specific optimizations.
Speed and scale address the time it takes to service incoming reads and writes to your application. Some databases are designed to optimize read-heavy apps, while others are designed to support write-heavy solutions. Selecting a database that can handle your app's I/O needs goes a long way to a scalable architecture.
The important thing is to understand the needs of your application; from the structure of your data, to the size of the data and the read and write speeds you need. If you're uncertain, you may wish to perform some data modeling to help you map out what's needed.
This leads us to my next tip:
Tip #2: Use Data Modeling to Guide Database Selection
Data modeling helps map your application's features into the data structure you'll need to implement them. Starting with a conceptual model, you can identify the entities, associated attributes, and entity relationships that you'll need. As you go through the process, it will become more apparent the type(s) of data structures you'll need to implement. You can then use these structural considerations to select the right category of database that will serve your application best:
Relational: stores data into classifications ('tables'), with each table consisting of one or more records ('rows') identified by a primary key. Tables may be related through their keys, allowing queries to join data from multiple tables together to access any/all required data. Relational databases require fixed schemas on a per-table basis that are enforced for each row in a table.
Document-oriented: stores structured information with any number of fields that may contain simple or complex values. Each document stored may have different fields, unlike SQL tables, which require fixed schemas. Some document stores support complex hierarchies of data through the use of embedded documents. Additionally, document stores offer extreme flexibility to developers, as fixed schemas do not need to be developed ahead of time. Search stores are often document-oriented databases that are optimized for data querying across one or more fields. Search-based data stores typically support additional features such as sorting by relevance and data faceting for drill-down capabilities.
Key/Value: Key/Value stores offer great simplicity in data storage, allowing for massive scalability of both reads and writes. Values are stored with a unique key ("bob") and a value ("555-555-1212") and may be manipulated using the following operations: Add, Reassign (Update), Remove, and Read. Some storage engines offer additional data structure management within these simple operations.
Column-oriented: similar to relational, data is stored in both rows and columns. However, columns may contain multiple values, allowing data to be fetched by row or by column for highly optimized data retrieval.
Graph: graph stores focus on storing entities and the relationships between them. These stores are very useful in navigating between entities and querying for relationships between them to any depth — something that is difficult to do with traditional relational or document databases.
As you start to map your application's features to data structures, consider the kinds of queries you will need to support for your UI or API. Some data structures will make your mapping logic easier into and out of your application to retrieve single entities, but they may not support the kinds of ad hoc queries you may need to support more complex data retrieval and reporting.
A final note on data modeling: Don't depend on things like database migrations and scaffolding generators to define your database structures. Data modeling will help you understand the data structures necessary to build your application. Use these tools to accelerate the implementation process, based on your database model.
Tip #3: You May Need More Than One Type of Database
During the modeling process, you may realize that you need to store your data in a specific data structure, where certain queries can't be optimized fully. This may be due to some complex search requirements, the need for robust reporting capabilities, caching, or the requirement for a data pipeline to accept and analyze incoming data. In these situations, more than one type of database may be required for your application.
When adopting more than one database, it's important to select one database that will own a specific set of data. This database becomes the canonical database for those entities or for a specific context. Any additional databases that work with this same data may have a copy, but they will not be considered an owner of this data.
For example, we may decide that a relational database is the best data structure for our application. However, we need to support a robust, faceted search within our application. In this case, we may choose PostgreSQL or MySQL for our canonical data store for all our entities. We then choose to use a document database such as Elasticsearch to index our entities by specific fields and facets. Elasticsearch may also store some basic details about our entities, such as name and description, so that our search results provide useful results. However, Elasticsearch does not own our entity data and we do not query it for the latest details. Instead, we consider the relational database the canonical source for the entity details and updates. We then keep Elasticsearch updated when data is changed in our relational database.
It's important to be thoughtful when adopting more than one database. Otherwise, your application may behave inconsistently and result in frustrated customers.
Putting it All Together
To summarize the process I use for selecting a database:
- Understand the data structure(s) you require, the amount of data you need to store/retrieve, and the speed/scaling requirements
- Model your data to determine if a relational, document, columnar, key/value, or graph database is most appropriate for your data.
- During the modeling process, consider things such as the ratio of reads-to-writes, along with the throughput you will require to satisfy reads and writes.
- Consider the use of multiple databases to manage data under different contexts/usage patterns.
- Always use a master database to store and retrieve canonical data, with one or more additional databases to support additional features such as searching, data pipeline processing, and caching.