Kafka: Beginners Overview
The article is a beginner's guide to the Kafka Event streaming Platform. The article outlines the framework for storing, reading, and analyzing streaming data.
Join the DZone community and get the full member experience.Join For Free
Apache Kafka is a distributed streaming platform and has architectural components include Producers, Topics, Consumers.
Producers - Publish streams/records, similar to a message queue or enterprise messaging system.
Topics - Store streams/records in a fault-tolerant, durable way.
Consumers - Reads/Process streams of records as they occur in the topic.
Legacy organizational working model, exchange/transfer of data between a source and target system through point to point communication.
After years and several Integration/Interface implementations to run business/entities, the organization may have many sources and target systems, they all have to exchange data with one another and things become really complicated.
The problem with the point to point architecture, consider we have x1 source systems and x2 target systems, in this scenario we need to have x1*x2 integrations and each integration comes with difficulties/Rules as mentioned below:
- Data transportation protocols eg. TCP, HTTP, REST, FTP, JDBC.
- The way that data is parsing formats, eg. binary, CSV, JSON, etc.
- Transformations of data/messages to the required target format.
- Each time we integrate the source system with the target system it requires active connections, which result in the source and target connections being busy.
Kafka allows us to decouple data/messages/streams between source and target systems.
Producer/Source systems will have their data put into apache Kafka and corresponding Consumer/target systems get data straight from Kafka, hence enabling decoupling by Kafka.
Datastream format supported, eg. pricing data, change data capture, time series, financial transactions, user interactions, images, video streams, etc.
Additionally, once data is in Kafka any system can access/consume data, such as databases, analytics systems, email systems, or audits.
Kafka is the best choice.
Kafka is distributed resilient, and fault-tolerant architecture. Importantly it scales horizontally and is extremely high performant.
Latency to exchange data from one system to another is usually less than seconds, it helps in the processing of near Real-time transactions.
How To Use Kafka?
To use this Kafka, we need to first set up our Kafka environment. Refer to any of the below points for demonstration.
- Zookeeper: It is used by Kafka to maintain the state between the nodes of the cluster.
- Brokers: The “Machines”, which store and emit data.
- Producers: Send messages to the topic.
- Consumer: Listen and read messages from the topic.
A Wide Array of Use Cases
- Messaging System.
- Activity Tracking.
- Gather Metrics from many different locations.
- Applications Logs gathering.
- Stream Processing.
- Decoupling of system dependencies, to reduce the load on underlying databases and your systems by decoupling them.
- Big data integrations.
This makes it much easier to implement asynchronous processes in enterprise applications.
Opinions expressed by DZone contributors are their own.