DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Securing and Monitoring Your Data Pipeline: Best Practices for Kafka, AWS RDS, Lambda, and API Gateway Integration
  • Automated Application Integration With Flask, Kakfa, and API Logic Server
  • Why Real-time Data Integration Is a Priority for Architects in the Modern Era
  • How To Install CMAK, Apache Kafka, Java 18, and Java 19 [Video Tutorials]

Trending

  • A Simple, Convenience Package for the Azure Cosmos DB Go SDK
  • Designing a Java Connector for Software Integrations
  • AI’s Role in Everyday Development
  • AWS to Azure Migration: A Cloudy Journey of Challenges and Triumphs
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Deployment
  4. Storm-Kafka Integration With Configurations and Code

Storm-Kafka Integration With Configurations and Code

A tutorial on how to use these two open source big data frameworks to analyze real-time data streams.

By 
Rinu Gour user avatar
Rinu Gour
·
Jan. 08, 19 · Tutorial
Likes (6)
Comment
Save
Tweet
Share
10.8K Views

Join the DZone community and get the full member experience.

Join For Free

What Is Storm?

Apache Storm is an open source, distributed, reliable, and fault-tolerant system. There are various use cases of Storm, like real-time analytics, online machine learning, continuous computation, and Extract Transformation Load (ETL).

However, for streaming data processing, there are several components that work together, such as:

  • Spout: Spout is a source of a stream, which is a continuous stream of log data.
  • Bolt: Further,Sspout passes the data to a component, called bolt. Basically, bolt consumes any number of input streams, does some processing, and possibly emits new streams.

The below diagram describes spout and bolt in the Storm architecture:

Storm Kafka Integration- Apache Storm Architecture

Storm Kafka Integration - Apache Storm Architecture


What Is Storm Kafka Integration?

Generally, both Kafka and Storm complement each other. So, we can say their powerful cooperation enables real-time streaming analytics for fast-moving big data. Hence, in order to make it easier for developers to ingest and publish data streams from Storm topologies, we perform Kafka-Storm integration.

Storm Kafka Integration

Apache Storm Kafka Integration – Storm Cluster with Kafka Broker

The below diagram describes the high-level integration view of what a Kafka Storm integration model will look like:

Storm Kafka Integration- The Working model of Kafka Storm

Storm Kafka Integration - The Working model of Kafka Storm

a. Using KafkaSpout

Basically, a regular spout implementation that reads from a Kafka cluster is known as a KafkaSpout. Its basic usage is:

SpoutConfig spoutConfig = new SpoutConfig(
 ImmutableList.of("kafkahost1", "kafkahost2"), // list of Kafka brokers
 8, // number of partitions per host
 "clicks", // topic to read from
 "/kafkastorm", // the root path in Zookeeper for the spout to store the consumer offsets
 "discovery"); // an id for this consumer for storing the consumer offsets in Zookeeper
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);

However, with a static list of brokers and a fixed number of partitions per host, the spout is parameterized.

Also, it stores the state of the offsets its consumed in Zookeeper. Moreover, to store the offsets and an id for this particular spout, the spout is parameterized with the root path. Hence, offsets for partitions will be stored in these paths, where “0”, “1” are ids for the partitions:

{root path}/{id}/0
{root path}/{id}/1
{root path}/{id}/2
{root path}/{id}/3

Make sure the offsets will be stored in the same Zookeeper cluster that Storm uses, by default. Also, we can override this via our spout config, like this:

spoutConfig.zkServers = ImmutableList.of("otherserver.com");
spoutConfig.zkPort = 2191;

The ability to force the spout to rewind to a previous offset is shown by the following configuration. We can do forceStartOffsetTime on the spout config, like so:

spoutConfig.forceStartOffsetTime(-2);

That will choose the latest offset written around that timestamp to start consuming. Also, we can force the spout to always start from the latest offset by passing in -1, and we can force it to start from the earliest offset by passing in -2.

i. Parameters for Connecting to Kafka Cluster

In addition, KafkaSpout is a regular spout implementation that reads the data from a Kafka cluster. Moreover, in order to connect to the Kafka cluster, it requires the following parameters:

  • List of Kafka brokers.

  • The number of partitions per host.

  • A topic name used to pull the message.

  • Root path in ZooKeeper, where Spout stores the consumer offset.

  • ID for the consumer required for storing the consumer offset in ZooKeeper.

The below code sample shows the KafkaSpout class instance initialization with the previous parameters:

Copy
SpoutConfig spoutConfig = new SpoutConfig(
 ImmutableList.of("localhost:9092", "localhost:9093"),
 2,
 " othertopic",
 "/kafkastorm",
 "consumID");
KafkaSpout kafkaSpout = new KafkaSpout(spoutConfig);

Moreover, to store the states of the message offset and segment consumption tracking if it is consumed, the Kafka Spout uses ZooKeeper.

At the root path specified for the ZooKeeper, these offsets are stored. Also, for storing the message offset, Storm uses its own ZooKeeper cluster, by default. However, by setting other ZooKeeper clusters we can use the Spout configuration.

To specify how Spout fetches messages from a Kafka cluster by setting properties, Kafka Spout also offers an option, like buffer sizes and timeouts.

It is very important to note that in order to run Kafka with Storm, it is a requirement to set up both Storm and Kafka clusters and also it should be in running state.

So, this was all about Storm Kafka Integration. Hope you like our explanation.

kafka Integration cluster Apache Storm

Published at DZone with permission of Rinu Gour. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Securing and Monitoring Your Data Pipeline: Best Practices for Kafka, AWS RDS, Lambda, and API Gateway Integration
  • Automated Application Integration With Flask, Kakfa, and API Logic Server
  • Why Real-time Data Integration Is a Priority for Architects in the Modern Era
  • How To Install CMAK, Apache Kafka, Java 18, and Java 19 [Video Tutorials]

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: