DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Why Queues Don’t Fix Scaling Problems
  • The Phantom Write Problem: Why Your Idempotency Implementation Is Silently Losing Data
  • Building a 3D WebXR Game with WASI Cycles: Integrating WasmEdge, Wasmtime, and Wasmer to Invoke MongoDB, Kafka, and Oracle
  • Control Your Services With OTEL, Jaeger, and Prometheus

Trending

  • You Learned AI. So Why Are You Still Not Getting Hired?
  • The Developer's Guide to Context-Aware AI: When Your Code Documentation Becomes Intelligent
  • Context-Aware Authorization for AI Agents
  • How Rule Engines Transform Business Agility and Code Simplicity
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Audit Database Changes With Debezium

Audit Database Changes With Debezium

In this article, we will explore Debezium to capture data changes. Debezium is a distributed open-source platform for change data capture. Point the Debezium...

By 
Siva Prasad Rao Janapati user avatar
Siva Prasad Rao Janapati
·
Updated Apr. 16, 21 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
16.4K Views

Join the DZone community and get the full member experience.

Join For Free


Debezium Logo

In this article, we will explore Debezium to capture data changes. Debezium is a distributed open-source platform for change data capture. Point the Debezium connector to the database and start listening to the change data events like inserts/updates/deletes right from the database transaction logs that other applications commit to your database.

Debezium is a collection of source connectors of Apache Kafka Connect. Debezium's log-based Change Data Capture() allows ingesting the changes directly from the database's transaction logs. Unlike other approaches, such as polling or dual writes, the log-based approach brings the below features.

  • Ensures that all data changes are captured. The data changes may come from multiple applications, SQL editors, etc. Debezium captures every change event.
  • Produces change events with a very low delay while avoiding increased CPU usage required for frequent polling.
  • As the changes are captured at the database transaction log level, no changes required to your data model, such as a 'Last Updated' column.
  • It captures deletes.

Use Case

Let us discuss a use case to audit the database table changes for compliance purposes. There are different approaches to audit the databases.

  1. Using database triggers to monitor the DDL/DML changes. But, database triggers come with pain if you don't use them wisely and hence lot of enterprise applications avoid them.
  2. Envers. The Envers module aims to provide an easy auditing/versioning solution for entity classes. It does a good job but, below are the issues we have.
    • The audit logging is synchronous.
    • The audit logging and the actual database changes for business logic need to be wrapped with the same transaction. If the audit logging fails, the whole transaction needs to be rolled back.
    • If we decide to push the changes to another database instance, we might end up using distributed transactions. This will add performance overhead to the application.
    • If we need to push the changes to other systems like analytics, search, etc. will be problematic.
    • Mixing audit logging with the actual business logic creates a codebase maintenance issue.
    • Not able to capture the changes coming from other applications/SQL shell.
  3. Writing our own audit framework to capture the data changes. This works but, has the same issues highlighted on #2 above.

Change Data Capture with Debezium

Now, let us see how Debezium solves the use case of database audit. The below design depicts the components involved to audit the DB with Debezium.


Change Data Capture with Debezium


Follow the below steps to set up the Debezium connector.

Step 1

Download the connectors from https://debezium.io/releases/1.4/#installation. In this example, I am using MySql. Hence, I downloaded the Debezium MySql connector. Debezium has connectors for a variety of databases.

Step 2

Install Kafka cluster. I used a simple Kafka cluster with one Zookeeper and one broker. Under the same Kafka installation, you will find Kafka connect related properties. Set the Debezium related jar files into the Kafka connect classpath by updating the plugin.path under connect-distributed.properties file.

Step 3

Enable the bin log for the MySQL database.

Step 4

Launch the Kafka cluster and the Kafka connect by launching the below commands.

XML
 




x


 
1
#To start the Zookeeper
2
bin/zookeeper-server-start.sh config/zookeeper.properties
3

           
4
#To start the Kafka broker
5
/bin/kafka-server-start.sh /config/server.properties
6

           
7
#To start the Kafka connect
8
/bin/connected-distributed.sh /config/connected-distributed.properties


Step 5

Add the MySQL source connector configuration to the Kafka connect.

JSON
 




xxxxxxxxxx
1
19


 
1
curl -k -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d '{
2
  "name": "mysql-connector-demo",
3
  "config": {
4
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
5
    "database.hostname": "localhost",
6
    "database.port": "3306",
7
    "database.user": "debezium",
8
    "database.password": "dbz",
9
    "database.server.id": "1",
10
    "database.server.name": "dbserver1",
11
    "database.history.kafka.bootstrap.servers": "localhost:9092",
12
    "database.history.kafka.topic": "customers_audit",
13
    "table.include.list": "inventory.customers",
14
    "transforms": "Reroute",
15
    "transforms.Reroute.type": "io.debezium.transforms.ByLogicalTableRouter",
16
    "transforms.Reroute.topic.regex": "([^.]+)\\.([^.]+)\\.([^.]+)",
17
    "transforms.Reroute.topic.replacement": "$3"
18
  }
19
}'



The details of the configuration are explained below:

Configuration Details


Step 6

Now, run some inserts/updates/deletes on the table which we configured to audit to see the events on the topic.

Below are some of the events we received on the topic for insert/update/delete DML. The actual JSON will have other properties. But, I am showing the trimmed version for simplicity.

JSON
 




xxxxxxxxxx
1
28


 
1
{
2
"before": null,
3
"after": {
4
"id": 1016,
5
"first_name": "Smart",
6
"last_name": "Techie",
7
"email": "[email protected]"
8
},
9
"source": {
10
"version": "1.4.2.Final",
11
"connector": "mysql",
12
"name": "dbserver1",
13
"ts_ms": 1615928467000,
14
"snapshot": "false",
15
"db": "inventory",
16
"table": "customers",
17
"server_id": 223344,
18
"gtid": null,
19
"file": "mysql-bin.000003",
20
"pos": 4015,
21
"row": 0,
22
"thread": 36,
23
"query": null
24
},
25
"op": "c",
26
"ts_ms": 1615928467236,
27
"transaction": null
28
}


JSON
 




xxxxxxxxxx
1
33


 
1
{
2
"before": {
3
"id": 1016,
4
"first_name": "Smart",
5
"last_name": "Techie",
6
"email": "[email protected]"
7
},
8
"after": {
9
"id": 1016,
10
"first_name": "Smart",
11
"last_name": "Techie",
12
"email": "[email protected]"
13
},
14
"source": {
15
"version": "1.4.2.Final",
16
"connector": "mysql",
17
"name": "dbserver1",
18
"ts_ms": 1615928667000,
19
"snapshot": "false",
20
"db": "inventory",
21
"table": "customers",
22
"server_id": 223344,
23
"gtid": null,
24
"file": "mysql-bin.000003",
25
"pos": 4331,
26
"row": 0,
27
"thread": 36,
28
"query": null
29
},
30
"op": "u",
31
"ts_ms": 1615928667845,
32
"transaction": null
33
}


JSON
 




xxxxxxxxxx
1
28


 
1
{
2
"before": {
3
"id": 1016,
4
"first_name": "Smart",
5
"last_name": "Techie",
6
"email": "[email protected]"
7
},
8
"after": null,
9
"source": {
10
"version": "1.4.2.Final",
11
"connector": "mysql",
12
"name": "dbserver1",
13
"ts_ms": 1615928994000,
14
"snapshot": "false",
15
"db": "inventory",
16
"table": "customers",
17
"server_id": 223344,
18
"gtid": null,
19
"file": "mysql-bin.000003",
20
"pos": 4696,
21
"row": 0,
22
"thread": 36,
23
"query": null
24
},
25
"op": "d",
26
"ts_ms": 1615928994823,
27
"transaction": null
28
}



You can find a list of clients who uses Debezium here. I hope you enjoyed this article. We will meet in another blog post. Till then, Happy Learning!!

Database kafka

Published at DZone with permission of Siva Prasad Rao Janapati. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Why Queues Don’t Fix Scaling Problems
  • The Phantom Write Problem: Why Your Idempotency Implementation Is Silently Losing Data
  • Building a 3D WebXR Game with WASI Cycles: Integrating WasmEdge, Wasmtime, and Wasmer to Invoke MongoDB, Kafka, and Oracle
  • Control Your Services With OTEL, Jaeger, and Prometheus

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook