Setting up Kafka Cluster With Gluster-Block Storage
Setting up Kafka Cluster With Gluster-Block Storage
Red Hat AMQ Streams is a massively-scalable, distributed, and high-performance data streaming platform based on the Apache ZooKeeper and Apache Kafka.
Join the DZone community and get the full member experience.Join For Free
Red Hat AMQ Streams
Red Hat AMQ Streams is a massively-scalable, distributed, and high-performance data streaming platform based on the Apache ZooKeeper and Apache Kafka projects.
AMQ Streams Kafka Bridge provides a Restful interface that allows HTTP-based clients to interact with a Kafka cluster. Kafka Bridge offers the advantages of a web API connection to AMQ Streams, without the need for client applications to interpret the Kafka protocol.
The API has two main resources —
topics — that are exposed and made accessible through endpoints to interact with consumers and producers in your Kafka cluster. The resources relate only to the Kafka Bridge, not the consumers and producers connected directly to Kafka.
The bridge provides a REST API, described by an Open API specification, which exposes multiple endpoints to allow typical Apache Kafka operations:
- Sending messages to topics (including to a specific partition)
- Subscribing to one or more topics (even using a pattern) as part of a consumer group, or asking for a specific partition assignment
- Receiving messages from the subscribed topics
- Committing offsets related to the received messages
- Seeking to a specific position (or at the beginning/end) in a topic partition
The client behavior and the interaction with the Apache Kafka cluster, through the bridge, is the same which happens with a native Kafka client but with HTTP/1.1 protocol semantics.
Each endpoint allows specific HTTP methods (GET, POST, DELETE) to execute the above operations.
For this demonstration, you will need the following technologies set up in your development environment:
- An Open Shift 3.11+ environment with Cluster Admin access
- Gluster block installed on openshift
- Open shift CLI (
- Apache Maven 3.6.3+
In this article, we demonstrate how to set up a persistent Kafka cluster with 3 Kafka brokers and 3 Kafka zookeepers with gluster-block storage and exposure through Kafka bridge.
Set Up Kafka Cluster
AMQ Streams requires block storage and is designed to work optimally with cloud-based block storage solutions, including Amazon Elastic Block Store (EBS). The use of file storage is not recommended.
Choose local storage (local persistent volumes) when possible. If local storage is not available, you can use a Storage Area Network (SAN) accessed by a protocol such as Fibre Channel or iSCSI.
There is no need to provision replicated storage because Kafka and Zookeeper both have built-in data replication.
It is recommended that you configure your storage system to use the XFS file system. AMQ Streams is also compatible with the ext4 file system, but this might require additional configuration for the best results.
For this demonstration gluster-block installed on openshift.
Verify the storage class up and running, in this demo, we will use PVCs taken from Gluster-block to persist the data that is written to the Kafka commit log by each node
So we have our glusterblock storage class set as the default, which means we can start consuming PVs. To deploy our Kafka cluster, we will use a Kafka Cluster CRD:
The above CRD will create 3 replicas for both Zookeeper and Kafka, data will also be replicated 3 times across the Kafka cluster nodes, where each created PV is backing up the Kafka commit log. Those PVs will be created automatically in the gluster block storage class as this is the default one.
Now verify the PVs.
Now create a Kafka Topic
Set up Kafka Bridge
The above CRD will create Kafka bridge
Clients for the Kafka BridgeInternal Clients
Kafka Bridgecustom resource.
Now create a route for 'service/my-bridge-bridge-service ' so that external clients can access
The bridge exposes two main REST endpoints to send messages:
The first one is used to send a message to a topic
topicname while the second one allows the user to specify the partition via
partitionid. Even using the first endpoint the user can specify the destination partition in the body of the message.
To send a message, a producer has to connect to the bridge using an HTTP POST request to the specific endpoint with a JSON payload containing the value and optionally the key and partition.
- the first one has key and value, so the bridge will send it to the partition based on the hash of the key
- the second one has the specified destination partition and the value
- the third one just has the value, so the bridge will apply a round-robin mechanism to determine the partition
From a consumer perspective, the bridge is much more complex due to the nature of how consuming messages from Apache Kafka works about consumer groups and partition rebalancing. For this reason, before subscribing to topics and starting to receive messages, an HTTP client has to “create” a corresponding consumer on the bridge which also means joining a consumer group. This happens through an HTTP POST on the following endpoint and providing a consumer configuration in the JSON payload.
The bridge created a new consumer in the group and returns to the client so-called
base_uri which is the URL that the client has to use for sending the subsequent requests (i.e. subscribe, polling, …).
The HTTP consumer will interact with the following endpoints for subscribing to topics, getting messages, committing offsets, and finally deleting the consumer.
As a native Apache Kafka client, getting messages means doing a “poll” operation which in terms of HTTP protocol means doing HTTP GET requests on the relevant endpoints; the bridge will return an array of records with the topic, key, value, partition and offset.
After consuming messages, if the auto-commit feature is not enabled on consumer creation, it is necessary to commit the offsets via an HTTP POST request specifying an offsets collection with topic, partition, and required offset to commit.
The bridge also exposes endpoints for seeking into a topic partition at the beginning, at the end, or a specific offset.
To seek to a specific position in the partition, the consumer must provide offset information through the JSON payload in the HTTP POST request. The format is the same as used to commit the offset.
Exposing the Apache Kafka cluster to clients using HTTP enables scenarios where the use of native clients is not desirable. Such situations include resource-constrained devices, network availability, and security considerations. Interaction with the bridge is similar to the native Apache Kafka clients but using the semantics of an HTTP REST API. The inclusion of the HTTP Bridge in Strimzi enhances the options available to developers when building applications with Apache Kafka.
Opinions expressed by DZone contributors are their own.