Cloudera Makes Waves in Hadoop-Land

DZone 's Guide to

Cloudera Makes Waves in Hadoop-Land

· Big Data Zone ·
Free Resource
Along with their first commercial offering, Cloudera is unloading a bunch of new features into their Cloudera Distribution of Hadoop (CDH), which has now reached its third version.  In an interview with DZone, Cloudera representative John Kreisa said, "We're really expanding the definition of what a Hadoop-based platform is."  Eight new projects have been added to the distro that provide job scheduling, workflow sequencing, and the ability to control streaming data sources.  Cloudera is also committing several of their in-house projects to the Apache Hadoop community.  


The Cloudera Hadoop distro simplifies connectivity, execution and performance-related projects.  The new features include a Pig package based on the latest Apache release along with HBase and ZooKeeper pachages, which were previously only supported as part of the contrib repository.  These are now first class packages in CDH3.  Now the additional projects included in CDH3 include: Hive, HBase, Sqoop, Oozie, Flume, Avro, Zookeeper, Pig, and Cloudera Desktop.  These projects address deployment requirements in the area of data integration, workflow, scheduling, high-level languages, serialization UI, fast read / write, and RPC.


Named for its similarity to logging flumes, which also have a 'stream' of 'logs', Flume is a project that's been developed internally at Cloudera that allows streaming data to be managed and captured into Hadoop.  Until today, this software was not publicly navailable, but now it's part of the CDH and it's being committed to the Apache Hadoop project.  Kreisa says, "Even medium sized organizations can have hundreds of different data sources that they want to load into a Hadoop cluster."  Medium sized organizations won't need a large budget to harness Flume.

Cloudera Desktop

Cloudera Desktop has been an available package in the CDH, and now it is also being sent to Apache, where it will be renamed 'Hadoop User Environment' (HUE).  The Cloudera Desktop GUI lets you build UIs on Hadoop and includes tools that are collected into a desktop environment and delivered as a web app.  The tools simplify cluster administration and job development through file browser navigation, a job browser, a cluster health console, and a job designer.  The job designer specifically lets you create reusable MapReduce, Streaming, and Pig templates for commonly run jobs.  It also lets you submit MapReduce jobs to your Hadoop cluster right from your browser.

Cloudera Enterprise

Cloudera's first commercial offering will help companies put Hadoop into production.  The offering consists of CDH plus a super-set of management tools and support.  Here are the three main areas of advanced management that will be available:

  • Authorization Management + Provisioning:  Extra Security layer and direct LDAP integration
  • Integration, Configuration, and Monitoring:  Mass configuration, monitoring, event handling and change management through a central console.
  • Resource Management:  Monitor and regulate the usage of cluster resources

You can download the CDH 3 for free at cloudera.com.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}