Why Apache Kafka
Two trends have emerged in the information technology space. First, the diversity and velocity of the data that an enterprise wants to collect for decision-making continue to grow. Such data include not only transactional records, but also business metrics, IoT data, operational metrics, application logs, etc.
Second, there is a growing need for an enterprise to make decisions in real time based on that collected data. Finance institutions want to not only defect fraud immediately, but also offer a better banking experience through features like real-time alerting, real-time recommendation, more effective customer service, and so on. Similarly, it’s critical for retailers to make changes in catalog, inventory, and pricing available as quickly as possible. It is truly a real-time world.
Before Apache Kafka, there wasn’t a system that perfectly met both of the above business needs. Traditional messaging systems are real-time, but weren’t designed to handle data at scale. Newer systems such as Hadoop are much more scalable, but were mostly designed for batch processing.
Apache Kafka is a streaming platform for collecting, storing, and processing high volumes of data in real-time. As illustrated in Figure 1, Kafka typically serves as a central data hub in which all data within an enterprise are collected. The data can then be used for continuous processing or fed into other systems and applications in real time. Kafka is in use by more than 30% of Fortune 500 companies across all industries.
Figure 1. Kafka as a central real-time hub