Over a million developers have joined DZone.

Differentiation Across the Apache Hadoop Distribution Vendor Landscape

DZone's Guide to

Differentiation Across the Apache Hadoop Distribution Vendor Landscape

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Hadoop Spring Roundup - With the Strata Conference, the Gartner BI Summit, and the Hadoop Summit all occurring in the month of March, there is a mountain of new Hadoop-related information and a flurry of new product announcements to absorb. To share our observations and perspectives on the “Spring, 2013 Hadoopalooza”, here is the first of a three part Bootstrap blog series.

The flurry of announcements from Apache Hadoop distribution vendors during the Strata conference last month might lead one to believe that the market is getting a bit overcrowded.  A closer inspection however suggests that there may indeed be a number of discreet sub-segments to be served by multiple vendors within the Hadoop market.  My conversations with product managers and solution consultants working in a sampling of vendor booths at Strata revealed some very clear distinctions in terms of how each company describes its unique value propositions.

I’ve summarized my interpretation of each vendor’s top line message in italics below.  My two to four word summarizations may not accurately reflect the vendor’s intended messages, but they do reflect what I walked away with.

  • Cloudera – Big Data Platform.  While Apache Hadoop is clearly the centerpiece of Cloudera Enterprise, the company positions itself as a broader Big Data solution provider with adjacent products such as Cloudera Manager, Cloudera Navigator, and Cloudera Impala.
  • Hortonworks – 100% Apache Hadoop Distribution.  By emphasizing its commitment to community-driven open source, Hortonworks is tapping into a common concern many enterprise software buyers have about vendor lock in.
  • MapR – Enterprise-ready.  The majority of Hadoop deployments today are relatively small-scale test and development or departmental implementations.  MapR is aiming its message toward customers putting Hadoop to work in mission critical production applications, where scalability, availability, reliability, failover and security are essential.
  • Intel – Optimized for the processor.  A major appeal of Hadoop is that it utilizes low cost commodity hardware for storage and processing.  Since the vast majority of those commodity servers happen to be powered by Intel, the company’s message promises the best of both worlds – low cost and great performance.
  • EMC Greenplum – Hadoop With SQL – By integrating the Greenplum MPP Advanced SQL platform with Hadoop’s HDFS in its recently announced Pivotal HD product offering, EMC’s messages are centered around accelerated adoption and higher performance.  Customers can leverage mainstream SQL skills, and bypass the performance limitations of Apache Hive.

Only time will tell how these differentiated messages resonate with software buyers.  The Apache Hadoop distribution market is still in its infancy and still evolving.  We’re watching it closely and will continue to share our observations and conclusions as they unfold.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}