Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Yahoo Adopts Druid, Hadoop Not the "End All, Be All"

DZone's Guide to

Yahoo Adopts Druid, Hadoop Not the "End All, Be All"

Hadoop creator Yahoo released a statement announcing that Druid would be filling in the database gaps left by Hadoop.

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

On their Engineering blog, Yahoo announced that it was welcoming Druid to its data analysis toolbox. The company, which created Hadoop ten years ago before handing it reins to the Apache foundation, cited the slowness of MapReduce style queries, accessibility, and a desire to have "adhoc slice-n-dice," :scaling tens of billions of events a day," and "ingesting data in real-time" as reasons for their adoption.

Described as a "column-oriented, distributed, streaming analytics database designed for OLAP queries," Druid keeps true to Yahoo's fondness for open source projects. ADT Mag reports that Druid was influenced by BigQuery, Dremel and PowerDrill technologies and is currently organized by the Druid Community.

On the Druid.io site, the database claims to offer "interactive analytics at scale" and is "designed for analytics." The database has features such as sub-second queries and real-time ingestion, boasting production clusters that have scaled up to:

  • 3+ trillion events/month
  • 1M+ events/sec through Druid's real-time ingestion
  • 100+ PB of raw data
  • 30+ trillion events
  • Hundreds of queries per second for applications used by thousands of users
  • Tens of thousands of cores

Yahoo is not replacing Hadoop but, rather, incorporating Druid to fill in the gaps. Hadoop was created in 2005 by Doug Cutting and Mike Cafarella. Cutting, who also originated Lucene, named Hadoop after his son's toy elephant. Hadoop came about as an open source answer to Google and Yahoo's search engines that Cutting and Cafarella aimed to design for their project Nutch, in which they needed to index an enormous number of web pages. Thanks to the release of the Google File System and Google MapReduce research papers, Cutting and Cafarella were able to automate the running of their project.

When Cutting was hired by Yahoo, Raymie Stata, Yahoo's chief architect of search and advertising at the time, contracted him to continue working on Hadoop. It was Yahoo's idea to keep the project open source and to champion it as a "general purpose" technology rather than keeping it strictly for search engines.

Included in the long list of companies currently using Hadoop is Facebook, Google, Adobe, LinkedIn, The New York Times, Spotify, Twitter, and eBay.

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.

Topics:
hadoop ,yahoo ,big data ,database ,druid.io

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}