Why Apache Kafka
Two trends have emerged in the information technology space. First, the diversity and velocity of the data that an enterprise wants to collect for decision-making continue to grow. Such data include not only transactional records, but also business metrics, IoT data, operational metrics, application logs, etc.
Second, there is a growing need for an enterprise to make decisions in real-time based on that collected data. Finance institutions want to not only detect fraud immediately, but also offer a better banking experience through features like real-time alerting, real-time recom- mendation, more effective customer service, and so on. Similarly, it's critical for retailers to make changes in catalog, inventory, and pricing available as quickly as possible. It is truly a real-time world.
Before Apache Kafka, there wasn't a system that perfectly met both of the above business needs. Traditional messaging systems are re- al-time, but weren't designed to handle data at scale. Newer systems such as Hadoop are much more scalable and handle all streaming use cases.
Apache Kafka is a streaming engine for collecting, caching, and processing high volumes of data in real-time. As illustrated in Figure 1, Kafka typically serves as a part of a central data hub in which all data within an enterprise are collected. The data can then be used for continuous processing or fed into other systems and applications in real time. Kafka is in use by more than 40% of Fortune 500 compa- nies across all industries.