Over a million developers have joined DZone.

Big Data to the Rescue: Database Monitoring Using Kafka

DZone's Guide to

Big Data to the Rescue: Database Monitoring Using Kafka

Apache Kafka, a part of a larger stream data platform within Big Data, can be used to monitor data changes/events in order to feed the updates to other systems.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Image titleIt is a common scenario where your application or business process is dependent on another system. Equally common is the fact that the source system is a legacy system offering few options for integration. For the purposes of this article, assume the legacy system provides status information for an aspect of your business, which is vital to your application.

The most common way to handle this requirement is to have a process query the underlying database on a scheduled basis to determine if any updates have occurred since the last time the program was called. The diagram below illustrates this scenario:

Image title

The biggest challenge with this approach is that the updates are limited to the schedule for the batch process.  So, if the job runs once an hour, updates made a second after the job finishes will not be available until an hour in the future.  Of course, running the batch process on a more frequent basis can likely elevate other concerns.

How Apache Kafka Can Help

Apache Kafka, a part of a larger stream data platform within Big Data, can be used to monitor data changes/events in order to feed the updates to other systems.

While often considered a messaging system, Kafka functions at a different level of abstraction - which is the structured commit log of database updates.  As transactions are written to the logs, an Apache Kafka Producer monitors the log for conditions which meet specific criteria:

Image title

In our example, as database commits related to status changes are made to the legacy system, Kafka monitors and captures the updates. At that point, Kafka can be instructed on what to do with the event. Often times, Kafka feeds to a stream data platform.  In our case, using Java with Kafka, it is possible to place a message on a queue for a processing by our Enterprise Service Bus (ESB). Our flow could then be illustrated as shown below:

Image title

With the above flow in place, as the changes are committed to the legacy system's relational database, the Kafka Producer identifies the updates via the transaction logs and places the messages on a message queue within the ESB.  From there, listeners on the queue take the appropriate action.

Unlike the batch process, the Kafka Producer is listening by way of monitoring the transactions logs.  This provides near real-time notification of when status changes occur.


It is likely that Big Data is on a road map in your IT infrastructure. It is also likely that your product of choice includes the option for streaming data. When building your list of desired functionality for Big Data in your organization, it is important to look beyond the analytical wins and to consider how Big Data can assist your transactional applications.

For additional reading, I highly recommend a series that Jay Kreps has written on Putting Apache Kafka To Use.

Have a really great day!

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

apache kafka ,database ,event ,stream computing ,enterprise service bus

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}