{{announcement.body}}
{{announcement.title}}
refcard cover
Refcard #254

Apache Kafka Essentials

Download the Refcard today for a deep dive into Apache Kafka including a review of the components, quick-start guides for Apache Kafka and Apache Connect, and example code for setting up Kafka Streams.

Free PDF for Easy Reference
refcard cover

Written By

author avatar Jun Rao
Co-founder, Confluent
author avatar William McLane
Messaging Evangelist, TIBCO Software Inc.
Section 1

About Apache Kafka

Two trends have emerged in the information technology space. First, the diversity and velocity of the data that an enterprise wants to collect for decision-making continue to grow. Such data include not only transactional records, but also business metrics, IoT data, operational metrics, application logs, etc.

Second, there is a growing need for an enterprise to make decisions in real-time based on that collected data. Finance institutions want to not only detect fraud immediately, but also offer a better banking experience through features like real-time alerting, real-time recommendation, more effective customer service, and so on. Similarly, it's critical for retailers to make changes in catalog, inventory, and pricing available as quickly as possible. It is truly a real-time world.

Before Apache Kafka, there wasn't a system that perfectly met both of the above business needs. Traditional messaging systems are real-time, but weren't designed to handle data at scale. Newer systems such as Hadoop are much more scalable and handle all streaming use cases.

Apache Kafka is a streaming engine for collecting, caching, and processing high volumes of data in real-time. As illustrated in Figure 1, Kafka typically serves as a part of a central data hub in which all data within an enterprise are collected. The data can then be used for continuous processing or fed into other systems and applications in real-time. Kafka is in use by more than 40% of Fortune 500 companies across all industries.


This is a preview of the Apache Kafka Essentials Refcard. To read the entire Refcard, please download the PDF from the link above.

Section 2

Pub/Sub in Apache Kafka

The first component in Kafka deals with the production and consumption of the data. The following table describes a few key concepts in Kafka: 

Topic 

Defines a logical name for producing and consuming records. 

Partition 

Defines a non-overlapping subset of records within a topic. 

Offset 

A unique sequential number assigned to each record within a topic partition. 

Record 

A record contains a key, a value, a timestamp, and a list of headers. 

Broker 

Server where records are stored. Multiple brokers can be used to form a cluster. 


This is a preview of the Apache Kafka Essentials Refcard. To read the entire Refcard, please download the PDF from the link above.

Section 3

Kafka Connect

The second component in Kafka is Kafka Connect, which is a framework that makes it easy to stream data between Kafka and other systems. As shown in Figure 3, one can deploy a Connect cluster and run various connectors to import data from sources like MySQL, TIBCO Messaging, or Splunk into Kafka (Source Connectors) and export data from Kafka (Sink Connectors) such as HDFS, S3, and Elasticsearch. 

The benefits of using Kafka Connect are: 

∙ Parallelism and fault tolerance 

∙ Avoiding ad-hoc code by reusing existing connectors 

∙ Built-in offset and configuration management 


This is a preview of the Apache Kafka Essentials Refcard. To read the entire Refcard, please download the PDF from the link above.

Section 4

Kafka Streams

Kafka Streams is a client library for building real-time applications and microservices where the input and/or output data is stored in Kafka. The benefits of using Kafka Streams are: 

∙ Less code in the application 

∙ Built-in state management 

∙ Lightweight 

∙ Parallelism and fault tolerance 

The most common way of using Kafka Streams is through the Streams DSL, which includes operations such as filtering, joining, grouping, and aggregation. 


This is a preview of the Apache Kafka Essentials Refcard. To read the entire Refcard, please download the PDF from the link above.

Section 5

Conclusion

Apache Kafka has become the de-facto standard for high performance, distributed data streaming. It has a large and growing community of developers, corporations, and applications that are supporting, maintaining, and leveraging it. If you are building an event-driven architecture or looking for a way to stream data in real-time, Apache Kafka is a clear leader in providing a proven, robust platform for enabling stream processing and enterprise communications. 


This is a preview of the Apache Kafka Essentials Refcard. To read the entire Refcard, please download the PDF from the link above.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}