As NoSQL data models continue to prove their worth in high-profile web properties and enterprise settings, developers and architects need a basic framework that helps them organize and differentiate these data stores according to their capabilities so that they can find out where to direct their more in-depth research.
Nathan Hurst's Visual Guide to NoSQL Systems is an excellent way to start your search for one, or many solutions. However, with the emergence of polyglot persistence, you don't necessarily need to choose just one data store, and now that we have data persistence available in hosted services, what's to stop us from trying many systems by popping them in and out of our custom platform? Heroku's Adam Wiggins recently wrote a great post that indicated which persistence solutions might work best in certain use cases:
- Frequently-written, rarely read statistical data (for example, a web hit counter) should use an in-memory key/value store like Redis, or an update-in-place document store like MongoDB.
- Big Data (like weather stats or business analytics) will work best in a freeform, distributed db system like Hadoop.
- Binary assets (such as MP3s and PDFs) find a good home in a datastore that can serve directly to the user’s browser, like Amazon S3.
- Transient data (like web sessions, locks, or short-term stats) should be kept in a transient datastore like Memcache. (Traditionally we haven’t grouped memcached into the database family, but NoSQL has broadened our thinking on this subject.)
- If you need to be able to replicate your data set to multiple locations (such as syncing a music database between a web app and a mobile device), you’ll want the replication features of CouchDB.
- High availability apps, where minimizing downtime is critical, will find great utility in the automatically clustered, redundant setup of datastores like Casandra and Riak.
Nathan Hurst's Visual Guide to NoSQL Systems uses a triangular diagram to visualize the specialization domains of each data model and its tradeoffs.
from Nathan Hurst's Blog
Each corner represents a primary attribute of a system, and every model covers two of the three attributes:
- Consistency means that each client always has the same view of the data.
- Availability means that all clients can always read and write.
- Partition tolerance means that the system works well across physical network partitions.
RDBMSs (Postgres, MySQL, etc.) have both Consistency and Availability
Cassandra, Voldemort, CouchDB, and Riak are some of the models that have Availability and Partition tolerance.
BigTable, HBase, MongoDB, Berkeley DB, and Redis are some of the models that have Consistency and Partition tolerance.
This is a pretty general differentiation of the different data models, but it's definitely a useful tool to direct your research. You'll have to drill down into a lot more research and then do some actual testing of the solution to see if it meets all of your needs. And even if one solution doesn't meet all of your needs, there's also the growing trend of polyglot persistence - give it a try. Another new and growing trend is DaaS - Database-as-a-Service. Basically, as NoSQL shows its natural suitability for cloud computing persistence, more people are able to test them out (at-will) as a cloud-based service.
Heroku's 'drop in and run' cloud platform already supports several add-ons for NoSQL databases. They include: MongoHQ for MongoDB, Cloudant for CouchDB, NorthScale's Memcached service, and soon Redis To Go. Amazon has an RDS service for those who want to stick with MySQL and use it as a hosted service.