Users of Cassandra and Hadoop may be interested in a new tool from Knewton called KassandraMRHelper. The purpose of the tool is implied in the name - simplifying the process of extracting data out of Cassandra and into Hadoop, and map-reducing the data - but it takes some different approaches from other techniques. According to the overview of KassandraMRHelper on Knewton's blog:
[KassandraMRHelper] doesn’t require a live Cassandra cluster to extract the data from. This allows us to re-run map-reduce jobs multiple times without worrying about any performance degradation of our production services. This means that we don’t have to accommodate more traffic for these offline analyses, which keeps costs down.
Knewton's full blog post also includes a breakdown of how it all works, as well as a small tutorial including sample code. Anybody working with Cassandra and Hadoop should take a look.