{{announcement.body}}
{{announcement.title}}
Refcard #254

Apache Kafka

Download the Refcard today for a deep dive into Apache Kafka including a review of the components, quick-start guides for Apache Kafka and Apache Connect, and example code for setting up Kafka Streams.

20k

Brought to you by

StreamSets
Free .PDF for easy Reference

Written by

Jun Rao Co-founder, Confluent
Tim Spann MVB, DZone MVB @PaaSDev
Refcard #254

Apache Kafka

Download the Refcard today for a deep dive into Apache Kafka including a review of the components, quick-start guides for Apache Kafka and Apache Connect, and example code for setting up Kafka Streams.

20k
Free .PDF for easy Reference

Written by

Jun Rao Co-founder, Confluent
Tim Spann MVB, DZone MVB @PaaSDev

Brought to you by

StreamSets
Table of Contents

Why Apache Kafka

About Apache Kafka

Section 1

Why Apache Kafka

Two trends have emerged in the information technology space. First, the diversity and velocity of the data that an enterprise wants to collect for decision-making continue to grow. Such data include not only transactional records, but also business metrics, IoT data, operational metrics, application logs, etc.

Second, there is a growing need for an enterprise to make decisions in real-time based on that collected data. Finance institutions want to not only detect fraud immediately, but also offer a better banking experience through features like real-time alerting, real-time recom- mendation, more effective customer service, and so on. Similarly, it's critical for retailers to make changes in catalog, inventory, and pricing available as quickly as possible. It is truly a real-time world.

Before Apache Kafka, there wasn't a system that perfectly met both of the above business needs. Traditional messaging systems are re- al-time, but weren't designed to handle data at scale. Newer systems such as Hadoop are much more scalable and handle all streaming use cases.

Apache Kafka is a streaming engine for collecting, caching, and processing high volumes of data in real-time. As illustrated in Figure 1, Kafka typically serves as a part of a central data hub in which all data within an enterprise are collected. The data can then be used for continuous processing or fed into other systems and applications in real time. Kafka is in use by more than 40% of Fortune 500 compa- nies across all industries.

Section 2

About Apache Kafka

Kafka was originally developed at LinkedIn in 2010, and it became a top level Apache project in 2012. It has three main components: Pub/Sub, Kafka Connect, and Kafka Streams. The role of each com- ponent is summarized in the table below.

This is a preview of the Apache Kafka Refcard. To read the entire Refcard, please download the PDF from the link above.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}