DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Building Hybrid Multi-Cloud Event Mesh With Apache Camel and Kubernetes
  • Data Ingestion Into Azure Data Explorer Using Kafka Connect
  • Kafka on Kubernetes, the Strimzi Way! (Part 4)
  • Kafka on Kubernetes, the Strimzi Way (Part 2)

Trending

  • The Perfection Trap: Rethinking Parkinson's Law for Modern Engineering Teams
  • Immutable Secrets Management: A Zero-Trust Approach to Sensitive Data in Containers
  • Simplifying Multi-LLM Integration With KubeMQ
  • Integrating Model Context Protocol (MCP) With Microsoft Copilot Studio AI Agents
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Kafka Connect on Kubernetes The Easy Way!

Kafka Connect on Kubernetes The Easy Way!

Tutorial that shows how to set up and use Kafka Connect on Kubernetes using Strimzi Operator

By 
Abhishek Gupta user avatar
Abhishek Gupta
DZone Core CORE ·
Jul. 29, 20 · Tutorial
Likes (5)
Comment
Save
Tweet
Share
12.9K Views

Join the DZone community and get the full member experience.

Join For Free

This is a tutorial that shows how to set up and use Kafka Connect on Kubernetes using Strimzi, with the help of an example.

Kafka Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems using source and sink connectors. Although it's not too hard to deploy a Kafka Connect cluster on Kubernetes (just "DIY"!), I love the fact that Strimzi enables a Kubernetes-native way of doing this using the Operator pattern with the help of Custom Resource Definitions.

In addition to bootstrapping/installing Kafka Connect, this also applies to operations such as scaling the Connect cluster, deploying and managing connectors, etc. (you will see this in action during the course of this blog post)

We will go through the process of deploying a Kafka Connect cluster on Kubernetes, installing a connector, and test it out — all this using kubectl and some YAML (of course!). I will be using Azure Event Hubs as the Kafka broker and Azure Kubernetes Service as the Kubernetes cluster - feel free to use other alternatives (e.g. with a local minikube cluster on your laptop)

All the artifacts are available on GitHub

Strimzi is responsible for all the heavy lifting.. In case you don't already know, here is a gist

Strimzi Overview

The Strimzi documentation is detailed yet very well organized and clear! Most of the below paragraph has been taken directly from the docs

Strimzi simplifies the process of running Apache Kafka in a Kubernetes cluster. Strimzi provides container images and Operators for running Kafka on Kubernetes. It is a part of the Cloud Native Computing Foundation as a Sandbox project (at the time of writing)

Strimzi Operators are fundamental to the project. These Operators are purpose-built with specialist operational knowledge to effectively manage Kafka. Operators simplify the process of: Deploying and running Kafka clusters and components, Configuring and securing access to Kafka, Upgrading and managing Kafka and even taking care of managing topics and users.

Here is a diagram which shows a 10,000 feet overview of the Operator roles:

I am not going to dive into the details of deploying Kafka using Strimzi in this post - probably something which I will tackle in future blogs

Pre-requisites

kubectl - https://kubernetes.io/docs/tasks/tools/install-kubectl/

If you choose to use Azure Event Hubs, Azure Kubernetes Service (or both) you will need a Microsoft Azure account. Go ahead and sign up for a free one!

Azure CLI or Azure Cloud Shell - you can either choose to install the Azure CLI if you don't have it already (should be quick!) or just use the Azure Cloud Shell from your browser.

Helm

I will be using Helm to install Strimzi. Here is the documentation to install Helm itself - https://helm.sh/docs/intro/install/

You can also use the YAML files directly to install Strimzi. Check out the quick start guide here - https://strimzi.io/docs/quickstart/latest/#proc-install-product-str

Let's start by setting up the required Azure services (if you're not using Azure, skip this section but please ensure you have the details for your Kafka cluster i.e. broker URLs and authentication credentials, if applicable)

I recommend installing the below services as a part of a single Azure Resource Group which makes it easy to clean up these services

Azure Event Hubs

Azure Event Hubs is a data streaming platform and event ingestion service. It can receive and process millions of events per second. It also provides a Kafka endpoint that can be used by existing Kafka based applications as an alternative to running your own Kafka cluster. Event Hubs supports Apache Kafka protocol 1.0 and later, and works with existing Kafka client applications and other tools in the Kafka ecosystem including Kafka Connect (demonstrated in this blog), MirrorMaker etc.

To setup an Azure Event Hubs cluster, you can choose from a variety of options including the Azure portal, Azure CLI, Azure PowerShell or an ARM template. Once the setup is complete, you will need the connection string (that will be used in subsequent steps) for authenticating to Event Hubs - use this guide to finish this step.

Please ensure that you also create an Event Hub (same as a Kafka topic) to act as the target for our Kafka Connect connector (details in subsequent sections)

Azure Kubernetes Service

Azure Kubernetes Service (AKS) makes it simple to deploy a managed Kubernetes cluster in Azure. It reduces the complexity and operational overhead of managing Kubernetes by offloading much of that responsibility to Azure. Here are examples of how you can setup an AKS cluster using Azure CLI, Azure portal or ARM template

Base install

To start off, we will install Strimzi and Kafka Connect, followed by the File Stream Source Connector

Install Strimzi

Installing Strimzi using Helm is pretty easy:

Shell
xxxxxxxxxx
1
 
1
//add helm chart repo for Strimzi
2
helm repo add strimzi https://strimzi.io/charts/
3
4
//install it! (I have used strimzi-kafka as the release name)
5
helm install strimzi-kafka strimzi/strimzi-kafka-operator


This will install the Strimzi Operator (which is nothing but a Deployment), Custom Resource Definitions and other Kubernetes components such as Cluster Roles, Cluster Role Bindings and Service Accounts

For more details, check out this link

To delete, simply helm uninstall strimzi-kafka

To confirm that the Strimzi Operator had been deployed, check it's Pod (it should transition to Running status after a while)

Shell
xxxxxxxxxx
1
 
1
kubectl get pods -l=name=strimzi-cluster-operator
2
3
NAME                                        READY   STATUS    RESTARTS   AGE
4
strimzi-cluster-operator-5c66f679d5-69rgk   1/1     Running   0          43s


Check the Custom Resource Definitions as well:

Shell
xxxxxxxxxx
1
11
 
1
kubectl get crd | grep strimzi
2
3
kafkabridges.kafka.strimzi.io           2020-04-13T16:49:36Z
4
kafkaconnectors.kafka.strimzi.io        2020-04-13T16:49:33Z
5
kafkaconnects.kafka.strimzi.io          2020-04-13T16:49:36Z
6
kafkaconnects2is.kafka.strimzi.io       2020-04-13T16:49:38Z
7
kafkamirrormaker2s.kafka.strimzi.io     2020-04-13T16:49:37Z
8
kafkamirrormakers.kafka.strimzi.io      2020-04-13T16:49:39Z
9
kafkas.kafka.strimzi.io                 2020-04-13T16:49:40Z
10
kafkatopics.kafka.strimzi.io            2020-04-13T16:49:34Z
11
kafkausers.kafka.strimzi.io             2020-04-13T16:49:33Z


I want to call out kafkas.kafka.strimzi.io which represents Kafka clusters in Kubernetes. We will focus on kafkaconnects.kafka.strimzi.io and kafkaconnectors.kafka.strimzi.io which represent Kafka Connect clusters and Connectors respectively.

I am going to skip over the other components but you can dig them out e.g. for Cluster Roles kubectl get clusterrole | grep strimzi

Now that we have the "brain" (the Strimzi Operator) wired up, let's use it!

Kafka Connect

We will need to create some helper Kubernetes components before we deploy Kafka Connect itself.

Before you proceed, clone the GitHub project

Shell
xxxxxxxxxx
1
 
1
git clone https://github.com/abhirockzz/strimzi-kafka-connect-eventhubs
2
cd strimzi-kafka-connect-eventhubs


Kafka Connect will need to reference an existing Kafka cluster (which in this case is Azure Event Hubs). We can store the authentication info for the cluster as a Kubernetes Secret which can later be used in the Kafka Connect definition.

Update the eventhubs-secret.yaml file to include the credentials for Azure Event Hubs. Enter the connection string in the eventhubspassword attribute.

e.g.

YAML
xxxxxxxxxx
1
 
1
apiVersion: v1
2
kind: Secret
3
metadata:
4
  name: eventhubssecret
5
type: Opaque
6
stringData:
7
  eventhubsuser: $ConnectionString
8
  eventhubspassword: Endpoint=sb://<eventhubs-namespace>.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=<access-key>

Leave eventhubsuser: $ConnectionString unchanged

To create the Secret:

Shell
xxxxxxxxxx
1
 
1
kubectl apply -f eventhubs-secret.yaml


By default, Kafka Connect is configured to send logs to stdout. We will use a custom configuration (log4j) to ensure that logs are stored to /tmp/connect-worker.log (in addition to stdout) - you will understand why this is done, in a moment

the configuration itself is stored in a log4j.properties

The log configuration can be stored in a ConfigMap which will later be referenced by the Kafka Connect definition. For details, check https://strimzi.io/docs/latest/#con-kafka-connect-logging-deployment-configuration-kafka-connect

Shell
xxxxxxxxxx
1
 
1
kubectl create configmap connect-logging-configmap --from-file=log4j.properties


Before we deploy Kafka Connect, let's look into its definition. You can see it in its entirety here, but I will go through the important bits.

Notice that the resource kind is KafkaConnect - it is a Custom Resource Definition. Another interesting part is annotations (I will explain this in a bit)

YAML
xxxxxxxxxx
1
 
1
apiVersion: kafka.strimzi.io/v1beta1
2
kind: KafkaConnect
3
metadata:
4
  name: my-connect-cluster
5
  annotations:
6
    strimzi.io/use-connector-resources: "true"


bootstrapServers points to a Kafka broker. This could be a comma-separated value for nodes in a HA cluster. In this case its a single Kafka endpoint for Azure Event Hubs (yes, that's all you need!)

YAML
xxxxxxxxxx
1
 
1
spec:
2
  version: 2.4.0
3
  replicas: 1
4
  bootstrapServers: <eventhubs-namespace>.servicebus.windows.net:9093


config is just good old Kafka Connect configuration similar to what you would use in connect-distributed.properties

YAML
 




xxxxxxxxxx
1


 
1
  config:
2
    group.id: connect-cluster
3
    offset.storage.topic: connect-cluster-offsets
4
    config.storage.topic: connect-cluster-configs
5
    status.storage.topic: connect-cluster-status



The authentication section simply refers to a Kubernetes Secret. In this case, we created one earlier with the name eventhubssecret which has the key eventhubspassword containing the connection string for azure event hubs

YAML
xxxxxxxxxx
1
 
1
  authentication:
2
    type: plain
3
    username: $ConnectionString
4
    passwordSecret:
5
      secretName: eventhubssecret
6
      password: eventhubspassword


This is where the ConfigMap with log4j config is referenced. This will automatically configure Kafka Connect to use this configuration

YAML
xxxxxxxxxx
1
 
1
 logging:
2
    type: external
3
    name: connect-logging-configmap


tls section is used to configure TLS certificates (duh!). In case of event hubs, although we use SASL over PLAINTEXT, it required you to use SSL (i.e. set security.protocol to SASL_SSL). I initially faced an issue with this which was promptly clarified! Hence this piece of configuration was added:

YAML
xxxxxxxxxx
1
 
1
  tls:
2
    trustedCertificates: []


Cool! We are ready to create a Kafka Connect instance. Before that, make sure that you update the bootstrapServers property with the Azure Event Hubs host name e.g.

YAML
xxxxxxxxxx
1
 
1
spec:
2
  version: 2.4.0
3
  replicas: 1
4
  bootstrapServers: <replace-with-eventhubs-namespace>.servicebus.windows.net:9093


To create the Kafka Connect instance:

Shell
xxxxxxxxxx
1
 
1
kubectl apply -f kafka-connect.yaml


To confirm:

Shell
xxxxxxxxxx
1
 
1
kubectl get kafkaconnects
2
3
NAME                 DESIRED REPLICAS
4
my-connect-cluster   1


This will create a Deployment and a corresponding Pod

Shell
xxxxxxxxxx
1
 
1
kubectl get pod -l=strimzi.io/cluster=my-connect-cluster
2
3
NAME                                          READY   STATUS    RESTARTS   AGE
4
my-connect-cluster-connect-5bf9db5d9f-9ttg4   1/1     Running   0          1h


You have a Kafka Connect cluster in Kubernetes! Check out the logs using kubectl logs <pod name>

Install File Source connector

Let's deploy a connector! To keep things simple, we will use the File Stream Source Connector which comes bundled with Kafka Connect by default. A common way of installing and managing connectors is to use the Kafka Connect REST API, but there is another way that Strimzi offers. This is a Kubernetes-centric approach where a Kakfa Connect connector is represented by a custom resource definition called KafkaConnector. All we need to do is create/update/delete KafkaConnector definitions with the details of our connectors, and Strimzi will take care of the rest!

check out the details in the Strimzi docs https://strimzi.io/docs/latest/#con-creating-managing-connectors-str

Here the definition of our connector:

YAML
xxxxxxxxxx
1
12
 
1
apiVersion: kafka.strimzi.io/v1alpha1
2
kind: KafkaConnector
3
metadata:
4
  name: my-source-connector
5
  labels:
6
    strimzi.io/cluster: my-connect-cluster
7
spec:
8
  class: org.apache.kafka.connect.file.FileStreamSourceConnector
9
  tasksMax: 2
10
  config:
11
    file: "/tmp/connect-worker.log"
12
    topic: strimzi


Just like we did before, let's understand what each component means:

YAML
xxxxxxxxxx
1
 
1
apiVersion: kafka.strimzi.io/v1alpha1
2
kind: KafkaConnector
3
metadata:
4
  name: my-source-connector


This is a KafkaConnector resource (specified by kind) whose name is my-source-connector

YAML
xxxxxxxxxx
1
 
1
  labels:
2
    strimzi.io/cluster: my-connect-cluster


This is where we refer to the Kafka Connect cluster - remember this annotation in the Kafka Connect definition shown above?

YAML
xxxxxxxxxx
1
 
1
  annotations:
2
    strimzi.io/use-connector-resources: "true"


This simply "activates" the feature and ensures that we are able to deploy connectors using the KafkaConnector CRD and we simply refer to the name of our kafkaconnect resource using the strimzi.io/cluster label

YAML
xxxxxxxxxx
1
 
1
spec:
2
  class: org.apache.kafka.connect.file.FileStreamSourceConnector
3
  tasksMax: 2
4
  config:
5
    file: "/tmp/connect-worker.log"
6
    topic: strimzi


Finally, in the connector spec, we define the attributes for our connector. Notice the config property which points to the /tmp/connect-worker.log file? Recall that we modified our Kafka Connect instance to push logs to this file. Now, we have configured our File source connector to stream contents of this (log) file and send it to a Kafka topic named strimzi. This makes for a nice demo since the file will keep getting updated and we should be able to see each line as a different message in the destination Kafka topic (in Azure Event Hubs)

I have used strimzi as the topic name. This needs to be the same as the Event Hub created in the previous section (while setting up Azure Event Hubs)

To see this in action, let's deploy the connector

YAML
xxxxxxxxxx
1
 
1
kubectl apply -f filestream-source-connector.yaml


To confirm, simply list the connectors:

Shell
xxxxxxxxxx
1
 
1
kubectl get kafkaconnectors
2
3
NAME                  AGE
4
my-source-connector   70s

You can install other connectors as well. One of the ways (and the easiest IMO) to do this is by extending the Strimzi base image and adding the required connector artifacts on top of it. Check out the documentation https://strimzi.io/docs/latest/#using-kafka-connect-with-plug-ins-str

Kafka Connect in action...

First, let's confirm that the Kafka Connect logs are being piped to the intended location. This is important since we're using the log file as a source for the File stream connector. For this, we need to peek inside the Kafka Connect Pod e.g.

Shell
xxxxxxxxxx
1
 
1
kubectl exec -it <kafka_connect_pod_name> -- tail -f /tmp/connect-worker.log


To make this easier, simply use the below command:

Shell
xxxxxxxxxx
1
 
1
kubectl exec -it $(kubectl get pod -l=strimzi.io/cluster=my-connect-cluster -o jsonpath='{.items[0].metadata.name}') -- tail -f /tmp/connect-worker.log


Now, you should see the Kafka Connect logs...

In a different terminal window, start a consumer process connecting to your Azure Event Hubs topic. I used kafkacat but there are other options such as the console consumer in the Kafka CLI itself or a programmatic consumer using Java, .NET, Go etc. (although it might a bit of an overkill in this case)

You should see the same logs here as well! For e.g.

JSON
xxxxxxxxxx
1
 
1
...
2
{"schema":{"type":"string","optional":false},"payload":"[2020-04-16 04:59:25,731] INFO WorkerSourceTask{id=my-source-connector-0} flushing 0 outstanding messages for offset commit (org.apache.kafka.connect.runtime.WorkerSourceTask)"}
3
{"schema":{"type":"string","optional":false},"payload":"[2020-04-16 04:59:25,785] INFO WorkerSourceTask{id=my-source-connector-0} Finished commitOffsets successfully in 55 ms (org.apache.kafka.connect.runtime.WorkerSourceTask)"}
4
...

The log itself is captured as part of the payload e.g. [2020-04-16 04:59:25,785] INFO WorkerSourceTask{id=my-source-connector-0} Finished commitOffsets successfully in 55 ms (org.apache.kafka.connect.runtime.WorkerSourceTask)

Managing Kafka Connect resources

To scale out Kafka Connect, simply update the no. of replicas in the spec e.g. from 1 to 2 in this case:

YAML
xxxxxxxxxx
1
 
1
spec:
2
  version: 2.4.0
3
  replicas: 2


Appy the updated manifest

Shell
xxxxxxxxxx
1
 
1
kubectl apply -f kafka-connect.yaml


Please ensure that you increase the no. of replicas by updating the manifest and not by updating the Deployment using kubectl scale. This is because, the Strimzi operator reconciliation loop will check the KafkaConnect resource, find that the replicas count is 1 and scale the Deployment back

There should be two Pods now:

Shell
xxxxxxxxxx
1
 
1
kubectl get pod -l=strimzi.io/cluster=my-connect-cluster
2
3
4
NAME                                          READY   STATUS    RESTARTS   AGE
5
my-connect-cluster-connect-5bf9db5d9f-9ttg4   1/1     Running   0          45m
6
my-connect-cluster-connect-5bf9db5d9f-pzn95   1/1     Running   0          1m5s

my-connect-cluster-connect-5bf9db5d9f-pzn95 is the new Pod

You can update the connector specification. For e.g. to allocate more tasks, update tasksMax from 2 to 5

YAML
xxxxxxxxxx
1
 
1
...
2
spec:
3
  class: org.apache.kafka.connect.file.FileStreamSourceConnector
4
  tasksMax: 5
5
...

Note: this will restart the connector

Clean up

To delete the connector and the Kafka Connect instance:

Shell
xxxxxxxxxx
1
 
1
kubectl delete -f filestream-source-connector.yaml
2
kubectl delete -f kafka-connect.yaml


To clean up the AKS cluster and Azure Event Hubs, simply delete the resource group:

Shell
xxxxxxxxxx
1
 
1
az group delete --name $AZURE_RESOURCE_GROUP --yes --no-wait


That concludes this blog post!

Like I mentioned, the Strimzi documentation is detailed, yet very clear and easy to navigate. To wrap things up, I will leave you with additional references from the Strimzi docs which I found useful in addition to the ones I mentioned in the post:

Strimzi doc references

  • Configuring Kafka Connect - https://strimzi.io/docs/latest/#proc-configuring-kafka-connect-deployment-configuration-kafka-connect
  • KafkaConnect schema https://strimzi.io/docs/latest/#type-KafkaConnect-reference
  • KafkaConnector schema reference https://strimzi.io/docs/latest/#type-KafkaConnector-reference
  • Kafka SASL auth configuration https://strimzi.io/docs/latest/#sasl_based_plain_authentication

I hope you find it useful for getting started with Kafka Connect on Kubernetes :)

kafka Kubernetes azure cluster Connector (mathematics) Event shell YAML Connection string

Opinions expressed by DZone contributors are their own.

Related

  • Building Hybrid Multi-Cloud Event Mesh With Apache Camel and Kubernetes
  • Data Ingestion Into Azure Data Explorer Using Kafka Connect
  • Kafka on Kubernetes, the Strimzi Way! (Part 4)
  • Kafka on Kubernetes, the Strimzi Way (Part 2)

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!