Cloudera Makes Waves in Hadoop-Land
CDH3The Cloudera Hadoop distro simplifies connectivity, execution and performance-related projects. The new features include a Pig package based on the latest Apache release along with HBase and ZooKeeper pachages, which were previously only supported as part of the contrib repository. These are now first class packages in CDH3. Now the additional projects included in CDH3 include: Hive, HBase, Sqoop, Oozie, Flume, Avro, Zookeeper, Pig, and Cloudera Desktop. These projects address deployment requirements in the area of data integration, workflow, scheduling, high-level languages, serialization UI, fast read / write, and RPC.
FlumeNamed for its similarity to logging flumes, which also have a 'stream' of 'logs', Flume is a project that's been developed internally at Cloudera that allows streaming data to be managed and captured into Hadoop. Until today, this software was not publicly navailable, but now it's part of the CDH and it's being committed to the Apache Hadoop project. Kreisa says, "Even medium sized organizations can have hundreds of different data sources that they want to load into a Hadoop cluster." Medium sized organizations won't need a large budget to harness Flume.
Cloudera DesktopCloudera Desktop has been an available package in the CDH, and now it is also being sent to Apache, where it will be renamed 'Hadoop User Environment' (HUE). The Cloudera Desktop GUI lets you build UIs on Hadoop and includes tools that are collected into a desktop environment and delivered as a web app. The tools simplify cluster administration and job development through file browser navigation, a job browser, a cluster health console, and a job designer. The job designer specifically lets you create reusable MapReduce, Streaming, and Pig templates for commonly run jobs. It also lets you submit MapReduce jobs to your Hadoop cluster right from your browser.
Cloudera EnterpriseCloudera's first commercial offering will help companies put Hadoop into production. The offering consists of CDH plus a super-set of management tools and support. Here are the three main areas of advanced management that will be available:
- Authorization Management + Provisioning: Extra Security layer and direct LDAP integration
- Integration, Configuration, and Monitoring: Mass configuration, monitoring, event handling and change management through a central console.
- Resource Management: Monitor and regulate the usage of cluster resources
You can download the CDH 3 for free at cloudera.com.