Over a million developers have joined DZone.

Big Data to the Rescue: Database Monitoring Using Kafka

DZone's Guide to

Big Data to the Rescue: Database Monitoring Using Kafka

Apache Kafka, a part of a larger stream data platform within Big Data, can be used to monitor data changes/events in order to feed the updates to other systems.

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

Image titleIt is a common scenario where your application or business process is dependent on another system. Equally common is the fact that the source system is a legacy system offering few options for integration. For the purposes of this article, assume the legacy system provides status information for an aspect of your business, which is vital to your application.

The most common way to handle this requirement is to have a process query the underlying database on a scheduled basis to determine if any updates have occurred since the last time the program was called. The diagram below illustrates this scenario:

Image title

The biggest challenge with this approach is that the updates are limited to the schedule for the batch process.  So, if the job runs once an hour, updates made a second after the job finishes will not be available until an hour in the future.  Of course, running the batch process on a more frequent basis can likely elevate other concerns.

How Apache Kafka Can Help

Apache Kafka, a part of a larger stream data platform within Big Data, can be used to monitor data changes/events in order to feed the updates to other systems.

While often considered a messaging system, Kafka functions at a different level of abstraction - which is the structured commit log of database updates.  As transactions are written to the logs, an Apache Kafka Producer monitors the log for conditions which meet specific criteria:

Image title

In our example, as database commits related to status changes are made to the legacy system, Kafka monitors and captures the updates. At that point, Kafka can be instructed on what to do with the event. Often times, Kafka feeds to a stream data platform.  In our case, using Java with Kafka, it is possible to place a message on a queue for a processing by our Enterprise Service Bus (ESB). Our flow could then be illustrated as shown below:

Image title

With the above flow in place, as the changes are committed to the legacy system's relational database, the Kafka Producer identifies the updates via the transaction logs and places the messages on a message queue within the ESB.  From there, listeners on the queue take the appropriate action.

Unlike the batch process, the Kafka Producer is listening by way of monitoring the transactions logs.  This provides near real-time notification of when status changes occur.


It is likely that Big Data is on a road map in your IT infrastructure. It is also likely that your product of choice includes the option for streaming data. When building your list of desired functionality for Big Data in your organization, it is important to look beyond the analytical wins and to consider how Big Data can assist your transactional applications.

For additional reading, I highly recommend a series that Jay Kreps has written on Putting Apache Kafka To Use.

Have a really great day!

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

apache kafka ,database ,event ,stream computing ,enterprise service bus

Published at DZone with permission of John Vester, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}