DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Building Scalable Real-Time Apps with AstraDB and Vaadin
Register Now

Trending

  • Authorization: Get It Done Right, Get It Done Early
  • How to LINQ Between Java and SQL With JPAStreamer
  • Auto-Scaling Kinesis Data Streams Applications on Kubernetes
  • What ChatGPT Needs Is Context

Trending

  • Authorization: Get It Done Right, Get It Done Early
  • How to LINQ Between Java and SQL With JPAStreamer
  • Auto-Scaling Kinesis Data Streams Applications on Kubernetes
  • What ChatGPT Needs Is Context
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Storm vs. Hadoop

Storm vs. Hadoop

Knowing about the emerging capabilities of Storm and Hadoop help the user choose the right technology for various business needs.

Suresh Rajagopal user avatar by
Suresh Rajagopal
·
Mar. 23, 17 · Opinion
Like (10)
Save
Tweet
Share
11.30K Views

Join the DZone community and get the full member experience.

Join For Free

Every day in the Big Data world, new frameworks are being introduced to solve complex problems — though Hadoop was the one who opened up a gate to look into the huge volume of data for data analytics. Knowing about the emerging capabilities of Storm and Hadoop help the user choose the right technology for various business needs.

Apache Hadoop

According to the Hadoop website, "the Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures."

Apache Storm

Apache Storm runs continuously, consuming data from the configured sources (spouts) and passes the data down the processing pipeline (bolts). Spouts and bolts make a topology, which can be written in any language. Storm can integrate with any queuing and any database system (i.e., RDBMS, NOSQL). 

Storm does not natively run on top of typical Hadoop clusters. Instead, it uses Apache ZooKeeper and its own master/minion worker processes to coordinate topologies, the master and worker states, and the message guarantee semantics.

Why Storm?

Quoting from the project site:

Storm has many use cases: realtime analytics, online Machine Learning, continuous computation, distributed RPC, ETL, and more. Storm is fast — a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.

In this table, I compare the trade-offs involved when choosing between Storm and Hadoop for data processing.

Storm

Hadoop

Distributed real-time processing of large volumes of high-velocity data. Distributed batch processing of large volumes of high-velocity data.
Data is mostly dynamic and continuously streamed. Data is mostly static and stored in persistent storage.
Relatively slow. Relatively fast.
Architecture consists of sprouts and bolts. Architecture consists of HDFS and MapReduce.
Scalable and fault-tolerant. Scalable and fault-tolerant.
Implemented in Clojure. Implemented in Java.
Simple and can be used with any programming language. Complex and can be used with any programming language.
Easy to set up and operate. Easy to set up but difficult to operate.
Used in business intelligence and Big Data analytics. Used in business intelligence and Big Data analytics.
Open source (Apache license). Open source (Apache license).


hadoop Big data

Opinions expressed by DZone contributors are their own.

Trending

  • Authorization: Get It Done Right, Get It Done Early
  • How to LINQ Between Java and SQL With JPAStreamer
  • Auto-Scaling Kinesis Data Streams Applications on Kubernetes
  • What ChatGPT Needs Is Context

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: