Over a million developers have joined DZone.

Big Data to the Rescue: Database Monitoring Using Kafka

DZone's Guide to

Big Data to the Rescue: Database Monitoring Using Kafka

Apache Kafka, a part of a larger stream data platform within Big Data, can be used to monitor data changes/events in order to feed the updates to other systems.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

Image titleIt is a common scenario where your application or business process is dependent on another system. Equally common is the fact that the source system is a legacy system offering few options for integration. For the purposes of this article, assume the legacy system provides status information for an aspect of your business, which is vital to your application.

The most common way to handle this requirement is to have a process query the underlying database on a scheduled basis to determine if any updates have occurred since the last time the program was called. The diagram below illustrates this scenario:

Image title

The biggest challenge with this approach is that the updates are limited to the schedule for the batch process.  So, if the job runs once an hour, updates made a second after the job finishes will not be available until an hour in the future.  Of course, running the batch process on a more frequent basis can likely elevate other concerns.

How Apache Kafka Can Help

Apache Kafka, a part of a larger stream data platform within Big Data, can be used to monitor data changes/events in order to feed the updates to other systems.

While often considered a messaging system, Kafka functions at a different level of abstraction - which is the structured commit log of database updates.  As transactions are written to the logs, an Apache Kafka Producer monitors the log for conditions which meet specific criteria:

Image title

In our example, as database commits related to status changes are made to the legacy system, Kafka monitors and captures the updates. At that point, Kafka can be instructed on what to do with the event. Often times, Kafka feeds to a stream data platform.  In our case, using Java with Kafka, it is possible to place a message on a queue for a processing by our Enterprise Service Bus (ESB). Our flow could then be illustrated as shown below:

Image title

With the above flow in place, as the changes are committed to the legacy system's relational database, the Kafka Producer identifies the updates via the transaction logs and places the messages on a message queue within the ESB.  From there, listeners on the queue take the appropriate action.

Unlike the batch process, the Kafka Producer is listening by way of monitoring the transactions logs.  This provides near real-time notification of when status changes occur.


It is likely that Big Data is on a road map in your IT infrastructure. It is also likely that your product of choice includes the option for streaming data. When building your list of desired functionality for Big Data in your organization, it is important to look beyond the analytical wins and to consider how Big Data can assist your transactional applications.

For additional reading, I highly recommend a series that Jay Kreps has written on Putting Apache Kafka To Use.

Have a really great day!

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

apache kafka ,database ,event ,stream computing ,enterprise service bus

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}