DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. Cassandra vs. (CouchDB | MongoDB | Riak | HBase)

Cassandra vs. (CouchDB | MongoDB | Riak | HBase)

Brian O' Neill user avatar by
Brian O' Neill
·
Apr. 03, 12 · Interview
Like (0)
Save
Tweet
Share
13.31K Views

Join the DZone community and get the full member experience.

Join For Free

Here is why in "Cassandra vs.", it's Cassandra FTW!

Our organization processes thousands of data sources continuously to produce a single consolidated view of the healthcare space.  There are two aspects of this problem that are challenging.  The first is schema management, and the second is processing time.

Creating a flexible RDBMS model to accomodate thousands disparate data sources is difficult, especially as those schemas change over time.  Even given a flexible relational model, to properly access and manipulate data in that model is complicated.  That complexity bleeds into application code and hampers analytics.

Given the volume of data and the frequency of updates, standardizing, indexing, analyzing and processing that data takes days of time across dozens and dozens of machines.   And even with round the clock processing, the business and customer appetites for additional and more current analytics are insatiable.

Trying to scale the RDBMS system vertically through hardware eventually has its limits.  Scaling horizontally through sharding becomes a challenge.  Operations and Maintenance (O&M) is difficult and requires a lot of custom coding to accommodate the partitioning.

We needed a distributed data system that provided:

  • Flexible Schema Management
  • Distributed Processing
  • Easy Administration (to lower O&M costs)
Driven by the need for flexible schemas, we turned to NoSQL.  We considered: MongoDB, CouchDB, HBase, and Riak.   Immediately we set out to see what support each of these had support for "real" map/reduce.  Given the processing we do, we knew we would eventually need support for all of Hadoop's goodness.  This includes extensions like Pig, Hive, and Cascading.

CouchDB dropped out here.  It supports map/reduce, but little or no notable support for Hadoop proper.  MongoDB scored "acceptable", but the Hadoop support was not nearly as evolved as the support in Cassandra.   Datastax actually distributes an enterprise version of Cassandra that fully integrates the Hadoop runtime.   Thus, we left MongoDB for another day and scored HBase's Hadoop support off the charts.

Riak is interesting in that they provide very slick native support for  map/reduce (http://wiki.basho.com/MapReduce.html) via REST, while they also provide a nice bridge from Hadoop.  I must admit.  We were *very* attracted to the REST interface. (which is why we eventually went on to create Virgil for Cassandra)

Left with Riak, HBase and Cassandra, we layered in some non-functional requirements.  First, we needed to be able to get third-party support.   Unfortunately, this is where Riak fell out.   With Datastax and Cloudera backing the other contenders, it was hard to go with what felt like the "new kid on the block"

NOW -- Down to HBase and Cassandra.  For this comparison, I won't bother re-iterating all the great points from Dominic William's great post.   Given that post and a few others, we decided on Cassandra.

Now, since choosing Cassandra, I can say there are a few other *really* important less tangible considerations.  The first, is the code base.  Cassandra has an extremely clean and well maintained code base.  Jonathan and team do a fantastic job managing the community and the code.  As we adopted NoSQL, the ability to extend the code-base and incorporate our own features has proven invaluable. (e.g.  triggers, a REST interface, and server-side wide-row indexing) 

Secondly, the community is phenomenal. That results in timely support, and solid releases on a regular schedule.   They do a great job prioritizing features, accepting contributions, and cranking out features. (They are now releasing ~quarterly)   We've all probably been part of other open source projects where the leadership is lacking, and features and releases are unpredictable, which makes your own release planning difficult.  Kudos to the Cassandra team.



If you enjoyed this article and want to learn more about MongoDB, check out this collection of tutorials and articles on all things MongoDB.

MongoDB Riak

Published at DZone with permission of Brian O' Neill, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Using JSON Web Encryption (JWE)
  • Streamlining Your Workflow With the Jenkins HTTP Request Plugin: A Guide to Replacing CURL in Scripts
  • Three SQL Keywords in QuestDB for Finding Missing Data
  • DevOps Roadmap for 2022

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: