DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Building a Real-Time Change Data Capture Pipeline With Debezium, Kafka, and PostgreSQL
  • Event-Driven Microservices: How Kafka and RabbitMQ Power Scalable Systems
  • System Coexistence: Bridging Legacy and Modern Architecture
  • Event-Driven Architectures: Designing Scalable and Resilient Cloud Solutions

Trending

  • Navigating Change Management: A Guide for Engineers
  • Optimizing Serverless Computing with AWS Lambda Layers and CloudFormation
  • Introducing Graph Concepts in Java With Eclipse JNoSQL, Part 3: Understanding Janus
  • SaaS in an Enterprise - An Implementation Roadmap
  1. DZone
  2. Data Engineering
  3. Big Data
  4. How to Set Up Kafka

How to Set Up Kafka

In this article, I am going to explain how to install Kafka on Ubuntu. We will also look at the properties of a Kafka broker, socket server, and flush.

By 
Gaurav Garg user avatar
Gaurav Garg
·
May. 19, 18 · Tutorial
Likes (72)
Comment
Save
Tweet
Share
242.0K Views

Join the DZone community and get the full member experience.

Join For Free

Kafka is one of the most popular publisher-subscriber models written in Java and Scala. It was originally developed by LinkedIn and later open-sourced. Kafka is known for handling heavy loads, i.e. I/O operations. You can find out more about Kafka here.

Installation

In this article, I am going to explain how to install Kafka on Ubuntu. To install Kafka, Java must be installed on your system. It is a must to set up ZooKeeper for Kafka. ZooKeeper performs many tasks for Kafka but in short, we can say that ZooKeeper manages the Kafka cluster state. 

ZooKeeper Setup

  • Download ZooKeeper from here.    

  • Unzip the file. Inside the conf directory, rename the file zoo_sample.cfgas zoo.cfg. 

  • The zoo.cfg file keeps configuration for ZooKeeper, i.e. on which port the ZooKeeper instance will listen, data directory, etc.

  • The default listen port is 2181. You can change this port by changing clientPort.

  • The default data directory is /tmp/data. Change this, as you will not want ZooKeeper's data to be deleted after some random timeframe. Create a folder with the name data in the ZooKeeper directory and change the dataDir in zoo.cfg.

  • Go to the bin directory.

  • Start ZooKeeper by executing the command ./zkServer.sh start.

  • Stop ZooKeeper by stopping the command ./zkServer.sh stop.

Kafka Setup

  • Download the latest stable version of Kafka from here.

  • Unzip this file. The Kafka instance (Broker) configurations are kept in the config directory.

  • Go to the config directory. Open the file server.properties.

  • Remove the comment from listeners property, i.e. listeners=PLAINTEXT://:9092. The Kafka broker will listen on port 9092.

  • Change log.dirs to /kafka_home_directory/kafka-logs.

  • Check the zookeeper.connect property and change it as per your needs. The Kafka broker will connect to this ZooKeeper instance.

  • Go to the Kafka home directory and execute the command ./bin/kafka-server-start.sh config/server.properties.

  • Stop the Kafka broker through the command ./bin/kafka-server-stop.sh.

Kafka Broker Properties

For beginners, the default configurations of the Kafka broker are good enough, but for production-level setup, one must understand each configuration. I am going to explain some of these configurations.

  • broker.id: The ID of the broker instance in a cluster. 

  • zookeeper.connect: The ZooKeeper address (can list multiple addresses comma-separated for the ZooKeeper cluster). Example: localhost:2181,localhost:2182.

  • zookeeper.connection.timeout.ms: Time to wait before going down if, for some reason, the broker is not able to connect.

Socket Server Properties

  • socket.send.buffer.bytes: The send buffer used by the socket server.

  • socket.receive.buffer.bytes: The socket server receives a buffer for network requests.

  • socket.request.max.bytes: The maximum request size the server will allow. This prevents the server from running out of memory.

Flush Properties

Each arriving message at the Kafka broker is written into a segment file. The catch here is that this data is not written to the disk directly. It is buffered first. The below two properties define when data will be flushed to disk. Very large flush intervals may lead to latency spikes when the flush happens and a very small flush interval may lead to excessive seeks.

  • log.flush.interval.messages: Threshold for message count that is once reached all messages are flushed to the disk.

  • log.flush.interval.ms: Periodic time interval after which all messages will be flushed into the disk.

Log Retention

As discussed above, messages are written into a segment file. The following policies define when these files will be removed.

  • log.retention.hours: The minimum age of the segment file to be eligible for deletion due to age.

  • log.retention.bytes: A size-based retention policy for logs. Segments are pruned from the log unless the remaining segments drop below log.retention.bytes.

  • log.segment.bytes: Size of the segment after which a new segment will be created.

  • log.retention.check.interval.ms: Periodic time interval after which log segments are checked for deletion as per the retention policy. If both retention policies are set, then segments are deleted when either criterion is met.

Conclusion

Thanks for reading my article.  Here you can find how to create producer and consumer in java. Kafka Monitoring In this article i have explained how to monitor kafka cluster.

kafka

Opinions expressed by DZone contributors are their own.

Related

  • Building a Real-Time Change Data Capture Pipeline With Debezium, Kafka, and PostgreSQL
  • Event-Driven Microservices: How Kafka and RabbitMQ Power Scalable Systems
  • System Coexistence: Bridging Legacy and Modern Architecture
  • Event-Driven Architectures: Designing Scalable and Resilient Cloud Solutions

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!