Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Configure an Apache Kafka Cluster on Ubuntu-16.04

DZone's Guide to

How to Configure an Apache Kafka Cluster on Ubuntu-16.04

In this tutorial, we'll learn how to install the open source Apache Kafka platform on an Ubuntu-based environment, and install the Java SDK as well.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Introduction

Apache Kafka is a free and open source stream-processing software platform developed by the Apache Software Foundation written in Scala. It is a distributed message agent specially designed to deal with huge volumes of real-time information effectively. Comparing other message brokers systems like ActiveMQ and RabbitMQ, Apache Kafka has a much higher throughput. Apache Kafka is based on the commit log that allows users to subscribe to it and publish data to any number of systems or real-time applications. Apache Kafka can be deployed on a single web server or in a distributed clustered environment. Apache Kafka has a four major APIs: Producer API, Consumer API, Connector API, and Streams API.

Features:

  • Support for parallel data load into Hadoop.
  • High throughput, supporting hundreds of thousands of messages per second, even with modest hardware.
  • Persistent messaging with O(1) disk structures that provide constant time performance, even with terabytes of stored messages.
  • The distributed system scales easily with no downtime.

Requirements for This Tutorial

  • A fresh Alibaba cloud instance with Ubuntu 16.04 server installed.
  • A static IP address 192.168.0.103 is configured on the instance.
  • A Root password is setup on the server.

Launch Alibaba Cloud ECS Instance

First, Login to your  Alibaba Cloud ECS Console.  Create a new ECS instance , choosing Ubuntu 16.04 as the operating system with at least 2GB RAM. Connect to your ECS instance and log in as the root user.

Once you are logged into your Ubuntu 16.04 instance, run the following command to update your base system with the latest available packages.

apt-get update -y

Install Java

Apache Kafka needs a Java runtime environment, so you will need to install the latest version of Java to your system. By default, the latest version of Java is not available in the Ubuntu 16.04 repository. So, you will need to add the Java repository to your system. You can do this by running the following command:

add-apt-repository ppa:webupd8team/java

Next, update the repository and install Java by running the following command:

apt-get install oracle-java8-installer -y

Once Java is installed, you can check the Java version using the following command:

java -version

Output:

java version "1.8.0_161"
Java(TM) SE Runtime Environment (build 1.8.0_161-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.161-b12, mixed mode)

Install Zookeeper

Apache Kafka depends on Zookeeper for maintaining configuration information, providing distributed synchronization, naming, and providing group services. So, you will need to install Zookeeper to your system. You can install it by running the following command:

apt-get install zookeeperd -y

By default, Zookeeper listen on port 2181. You can check it by running the following command:

netstat -nlpt | grep ':2181'

You should see the following output:

tcp6       0      0 :::2181                 :::*                    LISTEN

Install Apache Kafka

First, you will need to download the latest version of the Kafka from Apache website. You can download it by running the following command:

wget  http://redrockdigimark.com/apachemirror/kafka/1.1.0/kafka_2.12-1.1.0.tgz

Once the download is completed, extract the downloaded file using the following command:

tar -xvzf kafka_2.12-1.1.0.tgz

Next, copy the extracted directory to the /opt:

cp -r kafka_2.12-1.1.0 /opt/Kafka

Next, start the Kafka server by running the following script:

/opt/Kafka/bin/kafka-server-start.sh /opt/Kafka/config/server.properties

You should see the following output:

[2018-05-20 08:13:54,271] INFO [/config/changes-event-process-thread]: Starting (kafka.common.ZkNodeChangeNotificationListener$ChangeEventProcessThread)
[2018-05-20 08:13:54,449] INFO Kafka version : 1.1.0 (org.apache.kafka.common.utils.AppInfoParser)
[2018-05-20 08:13:54,461] INFO Kafka commitId : fdcf75ea326b8e07 (org.apache.kafka.common.utils.AppInfoParser)
[2018-05-20 08:13:54,466] INFO [KafkaServer id=0] started (kafka.server.KafkaServer)

Kafka server is now up and listening on port 9092.

Test Apache Kafka

Now, create your first topic named Topic1 with a single partition and only one replica by running the following command:

/opt/Kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1  --partitions 1 --topic Topic1

You should see the following output:

Created topic "Topic1".

Now you can see the created topic on Kafka by running the following command:

/opt/Kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181

You should see the following output:

Topic1

Now, post a sample messages to the Apache kafka topic named Topic1 with the following command:

/opt/Kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic Topic1

>>Hello Kafka
>How R You
>Ok
>

Next, run the Kafka consumer command to read data from Kafka cluster and display messages to standard output:

/opt/Kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic Topic1 --from-beginning

You should see your posted messages in the following output:

Hello Kafka
How R You
Ok

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,apache kafka ,ubuntu ,tutorial ,data science

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}