Migrate Data Across Kafka Cluster Using mirrormaker2 in Strimzi

In this article, we will discuss a use case where data from one Kafka cluster has to be migrated to another Kafka Cluster. We will be using mirrormaker 2.

Updated May. 18, 21 · Tutorial

Likes (2)

Comment

Save

8.9K Views

In this article, we will discuss a use case where data from one kafka cluster has to be migrated to another Kafka Cluster. Here the target is strimzi and the source is a standalone Kafka cluster. Target means where data has to be copied and the source is from where we want to copy/migrate data. I have an article on how to use mirrormaker with apache kafka clusters about mirrormaker version 1. This article is about mirrormaker 2, which has more features than mirrormaker1.

At the time of writing this article, the latest version of strimzi is 0.22.1 and can be downloaded from here.

I have installed strimzi on minikube version: v1.19.0. Standalone Kafka is installed on a different laptop with RHEL 8 OS. Also, I am using a simple Kafka producer to produce messages to the source Kafka cluster.

So let's begin the proof of concept.

1. Source Kafka(Standalone) configuration: To connect from external clients and mirror-maker we have to set advertised.listeners in [KAFKA_HOME]/config/server.properties. Start zookeeper and kafka node. Here I have only one zookeeper and kafka node.

    Shell
   
xxxxxxxxxx

#[KAFKA_HOME]/config/server.properties
advertised.listeners=PLAINTEXT://192.168.1.25:9092
listeners=PLAINTEXT://0.0.0.0:9092
# create a topic myTestTopic
#[KAFKA_HOME]/bin
./kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 3 --topic myTestTopic

2. Produce messages to Source kafka: I used ProducerKafka, available in my github repo. You can also use [KAFKA_HOME]/bin/kafka-console-producer.sh as well to produce a message to Source.

3. Strimzi setup: Follow upstream strimzi documentation for detailed setup in minikube. I have summarized the steps below; I followed all these commands for setup.

    Shell
   
 

     
    

      

     

      

     

     

    
xxxxxxxxxx

          

         

          

         

            
          
$ minikube start --cpus 3 --memory 10000 -p strimzi0221
$ kubectl create ns kafka
$ kubectl config set-context --current --namespace=kafka
$ sed -i 's/namespace: .*/namespace: kafka/' install/cluster-operator/*RoleBinding*.yaml
$ kubectl create -f install/cluster-operator
$ $ kubectl get deployments
NAME                       READY   UP-TO-DATE   AVAILABLE   AGE
strimzi-cluster-operator   1/1     1            1           6m36s
# strimzi-0.22.1/examples/kafka/kafka-ephemeral-single-replica.yaml
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
  name: my-cluster
spec:
  kafka:
    version: 2.7.0
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      log.message.format.version: "2.7"
      inter.broker.protocol.version: "2.7"
    storage:
      type: ephemeral
    livenessProbe:
      initialDelaySeconds: 35
      timeoutSeconds: 35
    readinessProbe:
      initialDelaySeconds: 35
      timeoutSeconds: 35
  zookeeper:
    replicas: 1
    livenessProbe:
      initialDelaySeconds: 35
      timeoutSeconds: 35
    readinessProbe:
      initialDelaySeconds: 35
      timeoutSeconds: 35
    storage:
      type: ephemeral
  entityOperator:
    topicOperator: {}
    userOperator: {}
$ kubectl create -f examples/kafka/kafka-ephemeral-single.yaml
# setup is ready
$ kubectl get pods 
NAME                                         READY   STATUS    RESTARTS   AGE
my-cluster-entity-operator-98c779b75-j84mt   3/3     Running   0          99s
my-cluster-kafka-0                           1/1     Running   0          2m37s
my-cluster-zookeeper-0                       1/1     Running   0          5m28s
strimzi-cluster-operator-957688b5c-dzbl7     1/1     Running   0          8m48s

      

     

4. MirrorMaker 2 configuration: Within the distribution itself, strimzi-0.22.1/examples/mirror-maker, we can find mirrormaker example yaml files. We will create another copy of kafka-mirror-maker-2-custom-replication-policy.yaml and modify that copied file.

    YAML
   
 

     
    

      

     

      

     

     

    

          x
         

          

         

          

         

            
          
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaMirrorMaker2
metadata:
  name: my-mirror-maker-2
spec:
  version: 2.7.0
  replicas: 1
  connectCluster: "my-target-cluster"
  clusters:
  - alias: "my-source-cluster"
    bootstrapServers: 192.168.1.25:9092
  - alias: "my-target-cluster"
    bootstrapServers: my-cluster-kafka-bootstrap:9092
    config:
      config.storage.replication.factor: 1
      offset.storage.replication.factor: 1
      status.storage.replication.factor: 1
  mirrors:
  - sourceCluster: "my-source-cluster"
    targetCluster: "my-target-cluster"
    sourceConnector:
      config:
        replication.factor: 1
        offset-syncs.topic.replication.factor: 1
        sync.topic.acls.enabled: "false"
        replication.policy.separator: ""
        replication.policy.class: "io.strimzi.kafka.connect.mirror.IdentityReplicationPolicy"
    heartbeatConnector:
      config:
        heartbeats.topic.replication.factor: 1
    checkpointConnector:
      config:
        checkpoints.topic.replication.factor: 1
        replication.policy.separator: ""
        replication.policy.class: "io.strimzi.kafka.connect.mirror.IdentityReplicationPolicy"
    topicsPattern: ".*"
    groupsPattern: ".*"
  logging: 
    type: inline
    loggers:
      connect.root.logger.level: "INFO"
  readinessProbe: 
    initialDelaySeconds: 25
    timeoutSeconds: 25
  livenessProbe:
    initialDelaySeconds: 25
    timeoutSeconds: 25

      

     

The important configurations are:

my-source-cluster: Here we have to provide the bootstrap-server url of Source Kafka against property bootstrapServers.

my-target-cluster: Here we have to provide the bootstrap-server URL of targer Kafka against property bootstrapServers. Target Kafka node is installed using strimzi in minikube. Thus I set my-cluster-kafka-bootstrap listening on 9092.

    Shell
   
xxxxxxxxxx

$ kubectl get svc
NAME                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
my-cluster-kafka-bootstrap           ClusterIP   10.99.60.80      <none>        9091/TCP,9092/TCP,9093/TCP   18h
my-cluster-kafka-brokers             ClusterIP   None             <none>        9091/TCP,9092/TCP,9093/TCP   18h
my-cluster-zookeeper-client          ClusterIP   10.103.165.80    <none>        2181/TCP                     19h
my-cluster-zookeeper-nodes           ClusterIP   None             <none>        2181/TCP,2888/TCP,3888/TCP   19h
my-mirror-maker-2-mirrormaker2-api   ClusterIP   10.109.246.111   <none>        8083/TCP                     16h

replication.policy.class: We set it with io.strimzi.kafka.connect.mirror.IdentityReplicationPolicy because we want the target kafka node to have the same topic as the source kafka node.

We have also set the readiness probe and liveliness probe with configurable seconds so that we can tune timeouts if any.

5. Apply MirrorMaker2 configuration: Apply the mirrormaker2 yaml file. Then check topics in the target kafka node.

    Shell
   
          x
         
$ cd examples/mirror-maker/
$ kubectl apply -f kafka-mirror-maker-2-custom-replication-policy-modify.yaml
$ kubectl get KafkaMirrorMaker2
NAME                DESIRED REPLICAS   READY
my-mirror-maker-2   1                1
 kubectl get pods 
NAME                                             READY   STATUS    RESTARTS   AGE
my-cluster-entity-operator-98c779b75-j84mt       3/3     Running   0          3m50s
my-cluster-kafka-0                               1/1     Running   0          4m48s
my-cluster-zookeeper-0                           1/1     Running   0          7m39s
my-mirror-maker-2-mirrormaker2-d5465d47d-k2dfz   1/1     Running   0          78s
strimzi-cluster-operator-957688b5c-dzbl7         1/1     Running   0          10m
$ kubectl get kt
NAME                                                                                               CLUSTER      PARTITIONS   REPLICATION FACTOR   READY
consumer-offsets---84e7a678d08f4bd226872e5cdd4eb527fadc1c6a                                        my-cluster   50           1                    True
heartbeats                                                                                         my-cluster   1            1                    True
mirrormaker2-cluster-configs                                                                       my-cluster   1            1                    True
mirrormaker2-cluster-offsets                                                                       my-cluster   25           1                    True
mirrormaker2-cluster-status                                                                        my-cluster   5            1                    True
my-source-cluster.checkpoints.internal                                                             my-cluster   1            1                    True
mytesttopic---ad8c4a4e03129cbd9ddc2900dfe8a763fb122ce7                                             my-cluster   3            1                    True
mytopic---c55e57fe2546a33f9e603caf57165db4072e827e                                                 my-cluster   1            1                    True
strimzi-store-topic---effb8e3e057afce1ecf67c3f5d8e4e3ff177fc55                                     my-cluster   1            1                    True
strimzi-topic-operator-kstreams-topic-store-changelog---b75e702040b99be8a9263134de3507fc0cc4017b   my-cluster   1            1                    True

6. Consume messages from Target Kafka node i.e. Strimzi on Kubernetes.

    Shell
   
xxxxxxxxxx

$ kubectl exec my-cluster-kafka-0 -it -- /opt/kafka/bin/kafka-topics.sh --describe --topic myTestTopic --bootstrap-server 0.0.0.0:9092
Topic: myTestTopic  PartitionCount: 3   ReplicationFactor: 1    Configs: message.format.version=2.7-IV2
    Topic: myTestTopic  Partition: 0    Leader: 0   Replicas: 0 Isr: 0
    Topic: myTestTopic  Partition: 1    Leader: 0   Replicas: 0 Isr: 0
    Topic: myTestTopic  Partition: 2    Leader: 0   Replicas: 0 Isr: 0
$ kubectl exec my-cluster-kafka-0 -it -- /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server 0.0.0.0:9092 --topic myTestTopic --group Group1 --from-beginning
message: 0
message: 1
message: 2
message: 3
message: 4
message: 5
message: 6
message: 7
message: 8
message: 9

That's it, guys. I think you will find this article interesting and helpful!

kafka Kubernetes cluster Data (computing)

Opinions expressed by DZone contributors are their own.

Related

Trending

Migrate Data Across Kafka Cluster Using mirrormaker2 in Strimzi

In this article, we will discuss a use case where data from one Kafka cluster has to be migrated to another Kafka Cluster. We will be using mirrormaker 2.

Related

Partner Resources