Over a million developers have joined DZone.

Cassandra Adds Hadoop MapReduce

DZone 's Guide to

Cassandra Adds Hadoop MapReduce

· Database Zone ·
Free Resource
Today the Cassandra project announced its first new release since becoming a Top-Level Project at Apache.  Don't let the low version number fool you.  Cassandra 0.6 is one of the most mature NoSQL distributed data stores in the open source market.  It was heavily developed by Facebook before it was open sourced in August 2008.  Currently Cassandra is being used by four of the largest social media sites in the world: Facebook, Digg, Reddit, and Twitter

One of the primary new features in Cassandra 0.6 is support for Apache Hadoop.  This is a major upgrade for Cassandra, giving it even more "big data" capabilities.  The new feature will allow Cassandra to run analytics against its own data using Hadoop's reliable MapReduce framework.


Cassandra 0.6 simplifies its architecture with a new integrated caching row.  With the implementation of this new feature, Cassandra no longer needs a separate caching layer.  Along with the simplified architecture, Cassandra 0.6 also features a performance boost.  The distributed data store can already process thousands of writes per second, and this version's enhancements builds on that number.

"Apache Cassandra 0.6 is 30% faster across the board, building on our already-impressive speed," said Jonathan Ellis, Apache Cassandra Project Management Committee Chair in the press release.  "It achieves scale-out without making the kind of design compromises that result in operations teams getting paged at 2 AM."  The Storage Team Technical Lead at Twitter, Ryan King, explained Twitter's reasons for using Cassandra: "At Twitter, we're deploying Cassandra to tackle scalability, flexibility and operability issues in a way that's more highly available and cost effective than our current systems."

One of Cassandra's best known features is its lack of any single point of failure.  The data store's distributed system smoothly replaces any node that goes down with a new node.  The system also has the flexibility to be tuned for more consistency or more availability.

The previous version of Cassandra (0.5) added load balancing and significantly improved bootstrap and concurrency.  New tools were also added, including JSON-based data import and export, new JMX metrics, and an improved command line interface.  “It's fantastic seeing the Project's community at the ASF grow to match the promise of the technology," said Ellis.

You can download Cassandra 0.6 now on the project's website.  For more info on Cassandra, check out "4 Months with Cassandra, a love story."

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}