Getting Hadoop, Hive and HBase Up and Running in Less than 15 Minutes
Note: This tutorial comes from guest writer Mark Grover. Enjoy.
If you have delved into Apache Hadoop and related projects, you know that installing and configuring Hadoop is hard. Often, a minor mistake during installation or configuration with messy tarballs will lurk for a long time until some otherwise innocuous change to the system or workload causes difficulties. Moreover, there is little to no integration testing among different projects (e.g. Hadoop, Hive, HBase, Zookeeper, etc.) in the ecosystem. Apache Bigtop is an open source project aimed at bridging exactly those gaps by:
1. Making it easier for users to deploy and configure Hadoop and related projects on their bare metal or virtualized clusters.
2. Performing integration testing among various components in the Hadoop ecosystem.
More about Apache Bigtop
The primary goal of Apache Bigtop is to build a community around the packaging and interoperability testing of Hadoop related projects. This includes testing at various levels (packaging, platform, runtime, upgrade, etc.) developed by a community with a focus on the system as a whole, rather than individual projects.
The latest released version of Apache Bigtop is Bigtop 0.5 which integrates the latest versions of various projects including Hadoop, Hive, HBase, Flume, Sqoop, Oozie and many more! The supported platforms include CentOS/RHEL 5 and 6, Fedora 16 and 17, SuSE Linux Enterprise 11, OpenSuSE 12.2, Ubuntu LTS Lucid and Precise, and Ubuntu Quantal.
Who uses Bigtop?
Folks who use Bigtop can be divided into two major categories. The first category of users are those who leverage Bigtop to power their own Hadoop Distributions. The second category of users are those who use Bigtop for deployment purposes.
In alphabetical order, they are:
EMC/Greenplum uses Bigtop extensively as a build framework for their 1000-node Analytics Workbench Cluster.
Juju Charms for Hadoop, HBase, Hive and Zookeeper and the associated packages for Ubuntu are a derivation of Apache Bigtop.
Magna Tempus Group provides ready-to-use, well integrated open source stack for intensive and high-performance in-memory data analysis based on such widely accepted technologies as Bigtop, Hadoop, HBase, Hive and many others.
Trend Micro uses Bigtop as the basis for their internal custom distribution of Hadoop, which starts with Bigtop but then pulls features from different upstream versions and includes Apache licensed non-core contributions as their platform needs dictate.
Uniting Data’s 100% open source platform is a Hadoop distribution based on Apache Bigtop.
WANdisco bases its 100% open source distro, WANdisco Distro (WDD), on Apache Bigtop.
Whether or not you have dabbled with Hadoop before, Apache Bigtop can go a long way towards making your life easier by providing infrastructure for easy deployment along with the latest debian and rpm artifacts for various projects. Moreover, these artifacts have been integration tested so you can rely on having a trustworthy cutting edge distribution of Hadoop and related projects on your cluster. You can use the wiki instructions to set up a pseudo-distributed cluster in no time or use the puppet recipes to set up a fully distributed cluster. You can also make use of soon-to-be-introduced Bigtop integration with Apache Whirr.
If you are a novice and would like to learn more about how you can use Apache Bigtop to quickly deploy Hadoop on your laptop and give it a test drive, or if you are a veteran and are curious to find out how Apache Bigtop can make your cluster more robust and easier to deploy, drop by my talk on Apache Bigtop at ApacheCon NA 2013 on February 26th, 2013.