How to Get a Twitter-esque Architecture Out of the Box

DZone 's Guide to

How to Get a Twitter-esque Architecture Out of the Box

· Big Data Zone ·
Free Resource
Today, a developer can work on a platform that integrates Hadoop, Cassandra, and Solr on a single cluster…  Hey!  Those technologies are used at another major company I've read about…  what was their name again?

Oh yeah, it's Twitter!  Take a look at the first three technologies that they mention in their Open Source Thanks page.  Now I'm certainly not saying this platform uses these technologies in the same way that Twitter does, but I think it does show that this platform offers a proven stack for real-time search and analytics.

Who Are These Guys?

The platform I'm referring to is called DataStax Enterprise 2.0, and as I said, it was released today.  Who is DataStax, you ask?  Well the company was founded by developers who worked on Cassandra while at Rackspace.  One of the founders, Jonathan Ellis, is currently the project chair of the Cassandra project at Apache.  The company itself was founded under the name "Riptano" and changed to "DataStax" later as they continued to evolve their technology, centered around Cassandra, into a full big data platform.  Right now they can hang their hat on customers like Wal-Mart, the world's largest private employer.  (See Video below)

More info on DataStax's Cassandra for Big Data story.

What Can This Thing Do? (new features)

Solr on Cassandra

Along with running Hadoop and Cassandra in a DataStax Enterprise cluster, the 2.0 release now adds Solr to the mix and makes it nice and scalable without the complexities required for scaling Solr out of the box.  Here are two perks to the Solr integration in DataStax:

  • CQL (Cassandra Query Language) access - It's nice to have a SQL style language translation for Lucene/Solr
  • Quick index rebuilding options

Snap-in Log Integration via Log4j

This should be welcome news for many Java developers.  Log4j is very popular and the familiarity with this tool will be a +1 in favor of using DataStax.  Here's what's meant by 'Snap-in Log Integration':

Snap-in log integration for application and weblogs so that they can be written, indexed, and searched, all in the same cluster with the rest of your data.

RDBMS Migration with Sqoop

Sqoop is an open source tool from Cloudera that helps teams migrate data that they have sitting in a relational database.

This pic will give you some sense of how Sqoop works...

Elastic Workload Provisioning

Workload management is somewhat of a new frontier for NoSQL data stores.  In DataStax Enterprise, you can modify a cluster to give Hadoop, Cassandra, or Solr more compute power based on the needs of your application

New OpsCenter Tools

The point-and-click management and monitoring tool for DataStax Enterprise 2.0 now allows you to monitor those new Solr search indexes, do a visual backup, and monitor multiple clusters.

Finally I'll leave you with the word from Wal-Mart on their DataStax Enterprise 2.0 deployment.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}