On their Engineering blog, Yahoo announced that it was welcoming Druid to its data analysis toolbox. The company, which created Hadoop ten years ago before handing it reins to the Apache foundation, cited the slowness of MapReduce style queries, accessibility, and a desire to have "adhoc slice-n-dice," :scaling tens of billions of events a day," and "ingesting data in real-time" as reasons for their adoption.
Described as a "column-oriented, distributed, streaming analytics database designed for OLAP queries," Druid keeps true to Yahoo's fondness for open source projects. ADT Mag reports that Druid was influenced by BigQuery, Dremel and PowerDrill technologies and is currently organized by the Druid Community.
On the Druid.io site, the database claims to offer "interactive analytics at scale" and is "designed for analytics." The database has features such as sub-second queries and real-time ingestion, boasting production clusters that have scaled up to:
- 3+ trillion events/month
- 1M+ events/sec through Druid's real-time ingestion
- 100+ PB of raw data
- 30+ trillion events
- Hundreds of queries per second for applications used by thousands of users
- Tens of thousands of cores
Yahoo is not replacing Hadoop but, rather, incorporating Druid to fill in the gaps. Hadoop was created in 2005 by Doug Cutting and Mike Cafarella. Cutting, who also originated Lucene, named Hadoop after his son's toy elephant. Hadoop came about as an open source answer to Google and Yahoo's search engines that Cutting and Cafarella aimed to design for their project Nutch, in which they needed to index an enormous number of web pages. Thanks to the release of the Google File System and Google MapReduce research papers, Cutting and Cafarella were able to automate the running of their project.
When Cutting was hired by Yahoo, Raymie Stata, Yahoo's chief architect of search and advertising at the time, contracted him to continue working on Hadoop. It was Yahoo's idea to keep the project open source and to champion it as a "general purpose" technology rather than keeping it strictly for search engines.
Included in the long list of companies currently using Hadoop is Facebook, Google, Adobe, LinkedIn, The New York Times, Spotify, Twitter, and eBay.