Some people make architecture decisions purely based on the loudest consultant:
For most others, however, decisions are not as simple as this. For instance: When should we start evaluating NoSQL storage systems as an alternative to RDBMS?
RDBMS as the Hammer for all Nails
This question obviously assumes that you’re starting out with an RDBMS, which is classically the database system that solves pretty much any problem decently enough not to be replaced easily. What does this mean? Simply put:
- RDBMSs have been around forever, so they have a huge advantage compared to “newcomers” in the market that don’t have all the excellent tooling, community, support, and maturity yet.
- E.F. Codd’s work may have been the single biggest influence on our whole industry. There has hardly been anything as revolutionary as the relational model ever since. It’s hard for an alternative database to be equally universal, i.e. they’re mostly solving niche problems.
Having said so, sometimes you do have a niche problem. For instance, a graph database problem. In fact, a graph is nothing fundamentally different from what you can represent in the relational model. It is easy to model a graph with a many-to-many relationship table.
The same is true for XML/JSON in the database (don’t forget, JSON is just XML with less syntax and less features, so it’s less awesome than XML). Sometimes you need to actually store document structures (hierarchical data) in their hierarchical form in your database, instead of normalizing them. Sure, you could normalize a document, but that’s a lot of work and not necessarily beneficial.
Most modern RDBMSs offer means to store (and much more importantly, query) XML/JSON data structures in simple forms. This includes PostgreSQL, Oracle, DB2, SQL Server, and others.
So, When Do We Decide to Switch?
As developers, we may be inclined to switch quite quickly. E.g. when working with graphs, we’d love to use Neo4j, which implements the awesome Cypher query language. When working with JSON, we’d love to use something like Couchbase, which implements the interesting N1QL query language. Both of these query languages are heavily inspired by SQL, which were wise vendor choices in my opinion (as opposed to MongoDB’s weird JSON-based query language), because in the end, the SQL language is still the most powerful and popular 4GL that was ever created.
But as developers, we shouldn’t make that decision lightly. Yes, at first, these specialized databases seem like a better fit. But the additional cost for operations teams to maintain, monitor, patch, and tune production systems must not be underestimated. This is even true within the RDBMS ecosystem. A prominent recent example was Uber’s switch (back!) from PostgreSQL to MySQL:
Do note that they switched the other way round before, only to regret things. Truth is, there are tons of reasons why your operations teams prefer to always use the same database, even if that’s quite expensive in terms of licensing. In many cases, however, it would be even more expensive to:
- Engage in additional license and/or support contracts with new database vendors.
- Find skilled DBA for the new technology (can be very hard with niche databases).
- Maintain two data silos and possibly sustain the cost of keeping them in sync.
Ultimately, There’s a Threshold:
In terms of using JSON in the database, it’s simple:
- Occasional JSON storage: Stick with RDBMS.
- Everything is JSON: Perhaps not RDBMS.
The same is true for graph problems. SQL can totally handle graphs and traverse them with recursive SQL. Here’s a funky statement that recursively calculates the subset sum problem:
If you only have a few tree/graph traversals to calculate (e.g. for a simple menu structure), don’t jump ship with RDBMSs yet. If graphs are your business and that’s all you do, then RDBMSs probably aren’t the right choice.
Whatever problem you’re solving, remember: Yes, if all you have is a hammer, every problem starts looking like a nail. But don’t dismiss RDBMS as a silly hammer. It is a very mighty hammer, and in 2016, it can do a lot of non-relational niche things decently well.
RDBMS is still the best default choice for all sorts of data problems. Only once you have gone above a certain threshold (or if you can foresee doing that), then you should look for alternatives. Because those alternatives will give you a much rougher time when you go outside of that niche (JSON, graphs, etc.) back to your “ordinary” relational business.