I'm going to discuss the most likely survivors from the NoSQL movement.
It all started so well. A myriad of products to answer data management needs over any structure or query plan you could possibly want. A rich ecosystem of databases to choose from has sprung up from the NoSQL community since 2005.
Of course, The Market being what it is, only a few will ever rise to become the big stars. Many companies form around new technology, but the market will select one or two to survive.
Even in the open-source community, the same forces are true. Rather than the market, it’s the size of the communities and those willing to contribute to their favorite product, making development sustainable.
Look at Java Servlet Containers, for example. In the paid-for space, you have IBM WebSphere Application Server (WAS) and Oracle WebLogic. In the open source ring... name ones still around. Tomcat and Jetty, right? Tomcat vastly outstrips the other options in terms of community size and adoption.
I’ve been saying since my book State of NoSQL 2016 that 2017 would be the year of contraction in the market place for NoSQL, where we will see a lot of consolidation and some companies go bust, and communities shrink.
It’s started to happen. A couple of years ago, IBM bought Cloudant to dip its toes into NoSQL. Apple bought FoundationDB to run some of its own services. No huge surprise there — just people buying tech for their own purposes.
But Microsoft CosmosDB may well be the catalyst that rapidly closes down a lot of NoSQL database companies.
Similarly, RethinkDB's commercial arm folded, with the open-source database now managed by The Linux Foundation.
At the same time, we’re seeing large vendors enter the market place. Microsoft is by far the biggest, with their stalking horse DocumentDB released in the fall of 2015. Before that, their Tables service also provided a solution.
Now expanded and touted as a hybrid NoSQL database and renamed to CosmosDB, Microsoft is pushing an easy-to-use, click-and-go, "good enough" service to rival other NoSQL vendors. Chief of those to be worried should be MongoDB, with which CosmosDB's APIs are deliberately compatible to, providing a friction free move to Azure CosmosDB.
I believe Microsoft has rightly determined that those who own the feature-rich database layers will also own the apps, as every cloud platform provides run times for every app language or platform you care about above the data level. Making it easy to use and providing developer-friendly and cost-effective mid tier and data tier services will lure these apps to your Cloud platform.
Basically, Microsoft’s CosmosDB is to NoSQL what SharePoint was to the Enterprise Content Management (ECM) companies. Remember them? I do — I worked for FileNet! There was a dozen of ECM vendors around when SharePoint arrived. Now, there are only a couple of ECM vendors with large install bases: Documentum (of EMC fame) and IBM FileNet. SharePoint was "good enough" for a lot of businesses.
Microsoft CosmosDB may well be the catalyst that rapidly closes down a lot of NoSQL database companies.
Do I Just Avoid NoSQL?
No! For the love of all that is holy, no! You’d be shooting yourself in the innovation foot.
Despite what RDBMS aficionados/naysayers may say, NoSQL ain’t going anywhere. Its flexible schema, rich data structures, horizontal scalability on commodity servers, and developer-friendly APIs will ensure their survival.
NoSQL database companies will still survive — just in fewer numbers.
This is not a bad thing. As the money being spent on NoSQL coalesces around a smaller number of vendors, the investment in those products should increase. This may finally lead to better tools and application platforms being built on them, which in turn will lead to greater adoption.
NoSQL needs a Salesforce-level vendor to run its apps solely on a NoSQL platform in order to have a major investment breakthrough. A NoSQL killer app, if you will. That day will come soon.
This is most likely to happen in the intersection of document stores and graph stores, which appears to be an ever more popular combination both in terms of community development effort and Enterprise customer spending.
The Most Likely Survivors
But who are the most likely survivors?
For open-source options, I think you can cross off anyone licensed under the AGPL, as companies with products licensed under that license seem to, for whatever reason, have trouble attracting community developers to work on the core platform and struggle to get large enterprises to deploy it. Aerospike is the only likely company to buck this trend, thanks to their unique Flash architecture and targeting of very specific markets.
I suspect people not adopting AGPL’ed products more widely is mainly due to licensing cost, as you have to license AGPL products to use them on money making production apps. Imagine if Tomcat had done that with their open-source licensing! I doubt we’d have a vibrant availability of Tomcat app hosting on every cloud platform out there.
Once the commercial arm of an AGPL-ed company shuts down, I doubt it’ll survive as a pure open-source database. Those companies have made investments and purchases that mean the enterprise version of the products have uncertain patent and licensing issues.
These enterprise features (i.e. backups, security, HA/DR support, management tools) are the pieces of the NoSQL database software products not originally covered by the AGPL. Who will pay the legal costs to investigate the legal issues prior to publishing as pure open source? I doubt anyone will, hence they won’t survive as purely open source once the commercial arms shut down.
Those commercial NoSQL companies most likely to survive aren’t the ones with many small applications in hundreds of large enterprises, but those with fewer customers but larger and more reliable revenue streams in enterprise-grade, mission-critical applications.
Additionally, for commercial players, the most likely to survive are not those with current large install bases but those who are succeeding in winning over large enterprises for large and crucially mission-critical enterprise deployments.
This is purely for one reason: Large enterprises will spend large amounts of money to ensure their large profit making applications are running smoothly.
Those who make cloud deployment ridiculously easy, as Microsoft is doing with CosmosDB, will also do very well. A choice of cloud platforms though would be good to see. The modern day "good data center citizen" needs to work with the customers’ preferred cloud vendor seamlessly — whoever that may be.
The most likely open-source databases to survive are ones with vibrant communities, with Apache 2 or BSD licensing terms, and who are open to third-party individuals contributing patches regularly. As opposed to just putting the code on GitHub and getting around to the odd bug request, eventually, but not accepting third-party pull requests.
Also, those with annual revenues above USD 60mil are most likely to survive too. There are precious few of these. Crucially, this money needs to come in a large part from large Enterprises if it is to be reliable and growable over the long term. No $20k licenses-are-the-norm vendor can survive in the very long term, no matter how many licenses they squeeze out of companies.
Below are those I think will most likely survive. Let’s check back in 12-18 months and see how well I’ve done!
- Key-value stores.
- Redis: Data structure support, lightweight, cloud hosting from rediscloud.
- Aerospike: Read my latest review. (Flash. Ah-aaargh! Lots of adoption in Fin Serv.)
- Column stores.
- Accumulo: Purely because of its use in Defence.
- DataStax Enterprise’s Cassandra: They also have the Titan graph store, but crucially, I don’t count this as "hybrid" as it’s a service layered over Cassandra as its storage layer only.
- Hypertable: Commercial adoption.
- Amazon DynamoDB: I doubt Amazon will drop this, ever.
- Document stores.
- No pure-play vendor will survive due to the emergence of Hybrid and due to Microsoft’s push into this space. This may take until 2020 to shake out. (I expect MongoDB and CouchDB to shrink in the long term because of this.)
- Graph/triple stores.
- Neo4j: Although I’m not sure this will survive after 2025 if it remains a pure play Graph database.
- No other pure-play will survive due to the emergence of hybrid.
- ArangoDB: Read my latest review. (Hybrid document/graph store, emerging giant killer, open-source option.)
- MarkLogic Server: Hybrid document/triple store; lots of large enterprise deals.
- Microsoft Azure’s CosmosDB: Good enough, cloud-hosted, cheap and easy to adopt, and bound to be extended to graph over time.
I realize the above list will displease many, but it was always bound to. Someone has to lose out.
Based on what I know about the inner workings of the companies, communities, and their current annual revenues, the above seems the most likely outcome to me.
Of course, I’m predicting the future, so anything could happen. If IBM bought MongoDB, that would be interesting — especially if they bought AllegroGraph from Franz, too! Such dreams are less likely though.
Please do make your own predictions in the comments section, too!