Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Storm vs. Hadoop

DZone's Guide to

Storm vs. Hadoop

Knowing about the emerging capabilities of Storm and Hadoop help the user choose the right technology for various business needs.

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

Every day in the Big Data world, new frameworks are being introduced to solve complex problems β€” though Hadoop was the one who opened up a gate to look into the huge volume of data for data analytics. Knowing about the emerging capabilities of Storm and Hadoop help the user choose the right technology for various business needs.

Apache Hadoop

According to the Hadoop website, "the Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures."

Apache Storm

Apache Storm runs continuously, consuming data from the configured sources (spouts) and passes the data down the processing pipeline (bolts). Spouts and bolts make a topology, which can be written in any language. Storm can integrate with any queuing and any database system (i.e., RDBMS, NOSQL). 

Storm does not natively run on top of typical Hadoop clusters. Instead, it uses Apache ZooKeeper and its own master/minion worker processes to coordinate topologies, the master and worker states, and the message guarantee semantics.

Why Storm?

Quoting from the project site:

Storm has many use cases: realtime analytics, online Machine Learning, continuous computation, distributed RPC, ETL, and more. Storm is fast β€” a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.

In this table, I compare the trade-offs involved when choosing between Storm and Hadoop for data processing.

Storm

Hadoop

Distributed real-time processing of large volumes of high-velocity data. Distributed batch processing of large volumes of high-velocity data.
Data is mostly dynamic and continuously streamed. Data is mostly static and stored in persistent storage.
Relatively slow. Relatively fast.
Architecture consists of sprouts and bolts. Architecture consists of HDFS and MapReduce.
Scalable and fault-tolerant. Scalable and fault-tolerant.
Implemented in Clojure. Implemented in Java.
Simple and can be used with any programming language. Complex and can be used with any programming language.
Easy to set up and operate. Easy to set up but difficult to operate.
Used in business intelligence and Big Data analytics. Used in business intelligence and Big Data analytics.
Open source (Apache license). Open source (Apache license).


Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

Topics:
big data ,big data analtics ,storm ,hadoop

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}