Kamelet for Streaming to Kafka! [Video]
Another fantastic alternative to Kafka Connectors when streaming data to your Kafka topics on Kubernetes.
Join the DZone community and get the full member experience.Join For Free
You want Kafka to stream and process the data. But what comes after you set up the platform, planned the partitioning strategy, storage options, and configured the data durability? Yes! How to stream data in and out of the platform. And this is exactly what I want to discuss today.
Before we go any further, let’s see what Kafka did to make itself blazing fast? Kafka is optimized for writing the stream data in binary format, which basically logs everything directly to the file system (Sequential I/O) and makes a minimum effort to process what's in the data (Optimize for Zero Copy). Kafka is super-charged at making sure data is stored as quickly as possible, and quickly replicating for a large number of consumers. But terrible at communication, the client that pushes content needs to SPEAK Kafka.
Here we are, having a super-fast logging and distributing platform, but dumb at connecting to other data sources. So who is going to validate the data sent in/out of the Kafka topic? What if I need to transform the data content? Can I filter the content partially? You guessed it. The clients. We now need smart clients that do most of the content processing and speak Kafka at the same time.
What Are the Most Used Connect Tools Today for Kafka Users?
Kafka Connect is what the majority of Kafka users are using today. It has been broken down into many parts such as connector, tasks, worker, converter, transformer, and error handler. You can view the task and worker as to how the data flow is executed. For a developer, they will be mostly configuring the rest of the 4 pieces.
- Connector: Describes the kind of source or the sink of the data flow, translating between the client/Kafka protocol, and knowing the libraries needed.
- Converter: Converts the binary to the data format accepted by the client or vice versa (Currently there is limited support from Confluent, they only do data format). And does data format validation.
- Transformer: Reads into the data format, can help make simple changes to individual data chunks. Normally you would do filtering, masking, or any minor changes. (This does not support simple calculations.)
- Error Handler: Define a place to store problematic data. (Confluent: Dead letter queues are only applicable for sink connectors.)
After configuring, it then uses Task and Worker to determine how to scale and execute that pipe data in/out of Kaka. For instance, running a cluster of works to scale and allow tasks to perform parallel processing of streams.
Camel is another great option!
Apache Camel is a GREAT alternative for connecting Kafka too. Here’s what Camel has to offer.
- Connector: Camel has more than 300+ connectors, you can use it to configure as a source or the sink of the data flow, translating between the 100+client/Kafka protocol.
- Converter: Validate and transform data formats with a simple configuration.
- Transformer: Not only does simple message modification, but it can also apply integration patterns that are good for streaming processing, such as split, filter, even customization of processes.
- Error Handler: Dead letter queue, catching exceptions.
There are also many ways to run Camel. You can have it running as a standalone single process that directly streams data in/out of Kafka. But Kamel works EXCEPTIONALLY well on Kubernetes. It runs as a cluster of instances, that execute in parallel to maximize the performance. It can be deployed as a native image through Quarkus to increase density and efficiency. The platform OpenShift (Kubernetes) allows users to control the scaling of the instance. Since it’s on K8s, another advantage is that the operation can operate these as a unified platform, along with all other microservices.
Why Kamelet? (This Is the Way!)
One of the biggest hurdles for non-Camel developers is, they need to learn another framework, maybe another language (Non-Java) to be able to get Camel running. What if we can smooth the learning curve and make it simple for newcomers? We see a great number of use cases where masking and filtering are implemented company-wide. Being able to build a repository and reuse these logics will make developers work more efficiently.
Plug and Play
You can look at Kamelets as templates, where you can define where to consume data from and send data to, does filtering, masking, simple calculation logic. Once the template is defined, it can be made available to the teams, that simply plug it into the platform, configure it for their needs (with either Kamelet Binding or another Camel route), and boom. The underlying Camel K will do the hard work for you, compile, build, package and deploy. You have a smart running data pipeline streams into Kafka.
Assemble and Reuse
In a data pipeline, sometimes, you just need that bit of extra work on the data. Instead of defining a single template for each case, you can also break it down into smaller tasks. And assemble these small tasks to perform in the pipeline for each use case.
Streams and Serverless
Kamelets allows you to stream data to/from either Kafka store or Knative event channel/broker. To be able to support Knative, Kamelet can help translate messages to CloudEvents, which is the CNCF standard event format for serverless. And also apply any pre/post-processing of the content in the pipeline.
Scalable and Flexible
Kamelet lives on Kubernetes(can also run standalone), which gives you a comprehensive set of scaling tools, readiness, liveness check, and scaling configuration. They are all part of the package. It scales by adding more instances. The UI on the OpenShift Developer Console can assist you to fill in what’s needed. And also auto-discover the available source/sink for you to choose where the data pipelines start or end.
Unify for DEV and OPS
In many cases, DevOps engineers are often required to develop another set of automation tools for the deployment of connectors. Kamelet can run like other applications on Kubernetes, the same tools can be used to build, deploy and monitor these pipelines. The streamlined DevOps experience can help speed up the automation setup time.
List of catalogs that are already available (Not enough?). If you just want to stream data directly, simply pick the ones you need and start streaming. And we welcome your contributions too.
What to know more about Kamelet? Take a look at this video, where it talks about why using Kamelet for streaming data to Kafka with a demo.
00:00 — Introduction
00:30 — What is Kamelet?
01:06 — Why do you need connectors to Kafka, and what is required in each connector?
02:16 — Why Kamelet?
07:45 — Marketplace of Kamelet!
08:47 — Using Kamelet as a Kafka User
10:58 — Building a Kamelet
13:25 — Running Kamelet on Kubernetes
15:44 — Demo
17:30 — Red Hat OpenShift Streams in Action
19:26 — Kamelets in Action
Published at DZone with permission of Christina Lin, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.