About Apache Kafka
Two trends have emerged in the information technology space. First, the diversity and velocity of the data that an enterprise wants to collect for decision-making continue to grow. Such data include not only transactional records, but also business metrics, IoT data, operational metrics, application logs, etc.
Second, there is a growing need for an enterprise to make decisions in real-time based on that collected data. Finance institutions want to not only detect fraud immediately, but also offer a better banking experience through features like real-time alerting, real-time recommendation, more effective customer service, and so on. Similarly, it's critical for retailers to make changes in catalog, inventory, and pricing available as quickly as possible. It is truly a real-time world.
Before Apache Kafka, there wasn't a system that perfectly met both of the above business needs. Traditional messaging systems are real-time, but weren't designed to handle data at scale. Newer systems such as Hadoop are much more scalable and handle all streaming use cases.
Apache Kafka is a streaming engine for collecting, caching, and processing high volumes of data in real-time. As illustrated in Figure 1, Kafka typically serves as a part of a central data hub in which all data within an enterprise are collected. The data can then be used for continuous processing or fed into other systems and applications in real-time. Kafka is in use by more than 40% of Fortune 500 companies across all industries.
This is a preview of the Apache Kafka Essentials Refcard. To read the entire Refcard, please download the PDF from the link above.