Over a million developers have joined DZone.

Yahoo Adopts Druid, Hadoop Not the "End All, Be All"

DZone's Guide to

Yahoo Adopts Druid, Hadoop Not the "End All, Be All"

Hadoop creator Yahoo released a statement announcing that Druid would be filling in the database gaps left by Hadoop.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

On their Engineering blog, Yahoo announced that it was welcoming Druid to its data analysis toolbox. The company, which created Hadoop ten years ago before handing it reins to the Apache foundation, cited the slowness of MapReduce style queries, accessibility, and a desire to have "adhoc slice-n-dice," :scaling tens of billions of events a day," and "ingesting data in real-time" as reasons for their adoption.

Described as a "column-oriented, distributed, streaming analytics database designed for OLAP queries," Druid keeps true to Yahoo's fondness for open source projects. ADT Mag reports that Druid was influenced by BigQuery, Dremel and PowerDrill technologies and is currently organized by the Druid Community.

On the Druid.io site, the database claims to offer "interactive analytics at scale" and is "designed for analytics." The database has features such as sub-second queries and real-time ingestion, boasting production clusters that have scaled up to:

  • 3+ trillion events/month
  • 1M+ events/sec through Druid's real-time ingestion
  • 100+ PB of raw data
  • 30+ trillion events
  • Hundreds of queries per second for applications used by thousands of users
  • Tens of thousands of cores

Yahoo is not replacing Hadoop but, rather, incorporating Druid to fill in the gaps. Hadoop was created in 2005 by Doug Cutting and Mike Cafarella. Cutting, who also originated Lucene, named Hadoop after his son's toy elephant. Hadoop came about as an open source answer to Google and Yahoo's search engines that Cutting and Cafarella aimed to design for their project Nutch, in which they needed to index an enormous number of web pages. Thanks to the release of the Google File System and Google MapReduce research papers, Cutting and Cafarella were able to automate the running of their project.

When Cutting was hired by Yahoo, Raymie Stata, Yahoo's chief architect of search and advertising at the time, contracted him to continue working on Hadoop. It was Yahoo's idea to keep the project open source and to champion it as a "general purpose" technology rather than keeping it strictly for search engines.

Included in the long list of companies currently using Hadoop is Facebook, Google, Adobe, LinkedIn, The New York Times, Spotify, Twitter, and eBay.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

hadoop ,yahoo ,big data ,database ,druid.io

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}