DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Control Your Services With OTEL, Jaeger, and Prometheus
  • Minimizing Latency in Kafka Streaming Applications That Use External API or Database Calls
  • Keeping Two Multi-Master Databases Aligned With a Vector Clock
  • Effortless Concurrency: Leveraging the Actor Model in Financial Transaction Systems

Trending

  • Advancing Your Software Engineering Career in 2025
  • Implementing Explainable AI in CRM Using Stream Processing
  • Designing a Java Connector for Software Integrations
  • Securing the Future: Best Practices for Privacy and Data Governance in LLMOps
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Audit Database Changes With Debezium

Audit Database Changes With Debezium

In this article, we will explore Debezium to capture data changes. Debezium is a distributed open-source platform for change data capture. Point the Debezium...

By 
Siva Prasad Rao Janapati user avatar
Siva Prasad Rao Janapati
·
Updated Apr. 16, 21 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
15.9K Views

Join the DZone community and get the full member experience.

Join For Free


Debezium Logo

In this article, we will explore Debezium to capture data changes. Debezium is a distributed open-source platform for change data capture. Point the Debezium connector to the database and start listening to the change data events like inserts/updates/deletes right from the database transaction logs that other applications commit to your database.

Debezium is a collection of source connectors of Apache Kafka Connect. Debezium's log-based Change Data Capture() allows ingesting the changes directly from the database's transaction logs. Unlike other approaches, such as polling or dual writes, the log-based approach brings the below features.

  • Ensures that all data changes are captured. The data changes may come from multiple applications, SQL editors, etc. Debezium captures every change event.
  • Produces change events with a very low delay while avoiding increased CPU usage required for frequent polling.
  • As the changes are captured at the database transaction log level, no changes required to your data model, such as a 'Last Updated' column.
  • It captures deletes.

Use Case

Let us discuss a use case to audit the database table changes for compliance purposes. There are different approaches to audit the databases.

  1. Using database triggers to monitor the DDL/DML changes. But, database triggers come with pain if you don't use them wisely and hence lot of enterprise applications avoid them.
  2. Envers. The Envers module aims to provide an easy auditing/versioning solution for entity classes. It does a good job but, below are the issues we have.
    • The audit logging is synchronous.
    • The audit logging and the actual database changes for business logic need to be wrapped with the same transaction. If the audit logging fails, the whole transaction needs to be rolled back.
    • If we decide to push the changes to another database instance, we might end up using distributed transactions. This will add performance overhead to the application.
    • If we need to push the changes to other systems like analytics, search, etc. will be problematic.
    • Mixing audit logging with the actual business logic creates a codebase maintenance issue.
    • Not able to capture the changes coming from other applications/SQL shell.
  3. Writing our own audit framework to capture the data changes. This works but, has the same issues highlighted on #2 above.

Change Data Capture with Debezium

Now, let us see how Debezium solves the use case of database audit. The below design depicts the components involved to audit the DB with Debezium.


Change Data Capture with Debezium


Follow the below steps to set up the Debezium connector.

Step 1

Download the connectors from https://debezium.io/releases/1.4/#installation. In this example, I am using MySql. Hence, I downloaded the Debezium MySql connector. Debezium has connectors for a variety of databases.

Step 2

Install Kafka cluster. I used a simple Kafka cluster with one Zookeeper and one broker. Under the same Kafka installation, you will find Kafka connect related properties. Set the Debezium related jar files into the Kafka connect classpath by updating the plugin.path under connect-distributed.properties file.

Step 3

Enable the bin log for the MySQL database.

Step 4

Launch the Kafka cluster and the Kafka connect by launching the below commands.

XML
 




x


 
1
#To start the Zookeeper
2
bin/zookeeper-server-start.sh config/zookeeper.properties
3

           
4
#To start the Kafka broker
5
/bin/kafka-server-start.sh /config/server.properties
6

           
7
#To start the Kafka connect
8
/bin/connected-distributed.sh /config/connected-distributed.properties


Step 5

Add the MySQL source connector configuration to the Kafka connect.

JSON
 




xxxxxxxxxx
1
19


 
1
curl -k -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d '{
2
  "name": "mysql-connector-demo",
3
  "config": {
4
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
5
    "database.hostname": "localhost",
6
    "database.port": "3306",
7
    "database.user": "debezium",
8
    "database.password": "dbz",
9
    "database.server.id": "1",
10
    "database.server.name": "dbserver1",
11
    "database.history.kafka.bootstrap.servers": "localhost:9092",
12
    "database.history.kafka.topic": "customers_audit",
13
    "table.include.list": "inventory.customers",
14
    "transforms": "Reroute",
15
    "transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
16
    "transforms.Reroute.topic.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
17
    "transforms.Reroute.topic.replacement": "$3"
18
  }
19
}'



The details of the configuration are explained below:

Configuration Details


Step 6

Now, run some inserts/updates/deletes on the table which we configured to audit to see the events on the topic.

Below are some of the events we received on the topic for insert/update/delete DML. The actual JSON will have other properties. But, I am showing the trimmed version for simplicity.

JSON
 




xxxxxxxxxx
1
28


 
1
{
2
"before": null,
3
"after": {
4
"id": 1016,
5
"first_name": "Smart",
6
"last_name": "Techie",
7
"email": "smarttechie@gmail.com"
8
},
9
"source": {
10
"version": "1.4.2.Final",
11
"connector": "mysql",
12
"name": "dbserver1",
13
"ts_ms": 1615928467000,
14
"snapshot": "false",
15
"db": "inventory",
16
"table": "customers",
17
"server_id": 223344,
18
"gtid": null,
19
"file": "mysql-bin.000003",
20
"pos": 4015,
21
"row": 0,
22
"thread": 36,
23
"query": null
24
},
25
"op": "c",
26
"ts_ms": 1615928467236,
27
"transaction": null
28
}


JSON
 




xxxxxxxxxx
1
33


 
1
{
2
"before": {
3
"id": 1016,
4
"first_name": "Smart",
5
"last_name": "Techie",
6
"email": "smarttechie@gmail.com"
7
},
8
"after": {
9
"id": 1016,
10
"first_name": "Smart",
11
"last_name": "Techie",
12
"email": "smarttechie_updated@gmail.com"
13
},
14
"source": {
15
"version": "1.4.2.Final",
16
"connector": "mysql",
17
"name": "dbserver1",
18
"ts_ms": 1615928667000,
19
"snapshot": "false",
20
"db": "inventory",
21
"table": "customers",
22
"server_id": 223344,
23
"gtid": null,
24
"file": "mysql-bin.000003",
25
"pos": 4331,
26
"row": 0,
27
"thread": 36,
28
"query": null
29
},
30
"op": "u",
31
"ts_ms": 1615928667845,
32
"transaction": null
33
}


JSON
 




xxxxxxxxxx
1
28


 
1
{
2
"before": {
3
"id": 1016,
4
"first_name": "Smart",
5
"last_name": "Techie",
6
"email": "smarttechie_updated@gmail.com"
7
},
8
"after": null,
9
"source": {
10
"version": "1.4.2.Final",
11
"connector": "mysql",
12
"name": "dbserver1",
13
"ts_ms": 1615928994000,
14
"snapshot": "false",
15
"db": "inventory",
16
"table": "customers",
17
"server_id": 223344,
18
"gtid": null,
19
"file": "mysql-bin.000003",
20
"pos": 4696,
21
"row": 0,
22
"thread": 36,
23
"query": null
24
},
25
"op": "d",
26
"ts_ms": 1615928994823,
27
"transaction": null
28
}



You can find a list of clients who uses Debezium here. I hope you enjoyed this article. We will meet in another blog post. Till then, Happy Learning!!

Database kafka

Published at DZone with permission of Siva Prasad Rao Janapati, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Control Your Services With OTEL, Jaeger, and Prometheus
  • Minimizing Latency in Kafka Streaming Applications That Use External API or Database Calls
  • Keeping Two Multi-Master Databases Aligned With a Vector Clock
  • Effortless Concurrency: Leveraging the Actor Model in Financial Transaction Systems

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!