DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Scaling Salesforce Apps Using Heroku Microservices - Part 2
  • Apache Camel, the Powerful Client for Salesforce Data Capture
  • Using PostgreSQL pgoutput Plugin for Change Data Capture With Debezium
  • Fixing Common Oracle Database Problems

Trending

  • How to Submit a Post to DZone
  • DZone's Article Submission Guidelines
  • How Large Tech Companies Architect Resilient Systems for Millions of Users
  • Unlocking AI Coding Assistants Part 4: Generate Spring Boot Application
  1. DZone
  2. Data Engineering
  3. Data
  4. Change Data Capture Using Apache NiFi

Change Data Capture Using Apache NiFi

Capturing all changes from a relational database with Apache NiFi is very easy and explained. CDC is a common use case for extracting transactional data in a streaming manner to populate a datawarehouse or datalake in Hadoop.

By 
Mark Herring user avatar
Mark Herring
·
Sep. 20, 16 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
20.2K Views

Join the DZone community and get the full member experience.

Join For Free

Prerequisites

  1. Download HDP Sandbox
  2. MySQL database (Should already be present in the sandbox)
  3. NiFi 0.6 or later (Download and install a new version of NIFI or use Ambari to install NIFI in the sandbox)

MySQL Setup (Source Database)

In this setup, we will create a table in MySQL tables and create a few triggers on the tables to emulate transactions.

  • These triggers will find out if the change introduced was an insert or an update
  • It will also update the time stamp on the updated/inserted row. (This is very important as NiFi will be polling on this column to extract changes based on the time stamp.)
unix> mysql –u root –p
unix>Enter
password:
mysql>
mysql> create database
test_cdc;
mysql> create user
'test_cdc'@'localhost' identified by 'test_cdc';
mysql> GRANT ALL
PRIVILEGES ON *.* TO 'test_CDC'@'%' IDENTIFIED BY 'test_CDC' WITH GRANT OPTION;
mysql>Flush Privileges
mysql> exit;
unix> mysql –u test_cdc –p
test_cdc
mysql>create table CDC_TEST
(
Column_A int, 
Column_B text, 
Created_date datetime,
INFORMATION text
);

Create Triggers in MySQL:

      mysql> create trigger CDC_insert 
       before insert on
       cdc_test
       for each row
       set 
          NEW.created_date =NOW()
        , NEW.information = 'INSERT';
mysql> create trigger CDC_UPDATE  
        before update on 
        cdc_test
        for each row
    set 
      NEW.created_date = NOW()
     , NEW.information = 'UPDATE';

Hive Setup (Destination Database)

In Hive, we have created an external table, with the exact same data structure as MySQL table, NiFi would be used to capture changes from the source and insert them into the Hive table.

Using AMBARI Hive view or from HIVE CLI create the following table in the hive default database.

I have used hive cli to create the table:

Unix> hive   Hive> create external table
                         HIVE_TEST_CDC   
                         (   COLUMN_A int ,   
                             COLUMN_B string,  
                             CREATED_DATE string, 
                              INFORMATION string)   
stored as avro   
location '/test-nifi/CDC/'

Note: I am not including how to create Managed Hive table with ORC format, that would be covered in a different article.

NiFi Setup

This is a simple NIFI setup, the queryDatabase table processor is only available as part of default processors from version 0.6 of NiFi.

queryDatabaseProcessor Configuration

It's very intuitive:

The main things to configure is DBCPConnection Pool and Maximum-value Columns.

Please choose this to be the date-time stamp column that could be a cumulative change-management column.

This is the only limitation with this processor as it is not a true CDC and relies on one column. If the data is reloaded into the column with older data the data will not be replicated into HDFS or any other destination.

This processor does not rely on Transactional logs or redo logs like Attunity or Oracle Goldengate. For a complete solution for CDC please use Attunity or Oracle Goldengate solutions.

DBCPConnectionPool Configuration:

putHDFS processor

configure the Hadoop Core-site.xml and hdfs-site.xml and destination HDFS directory in this case it is:

/test-nifi/CDC 

Make sure this directory is present in HDFS otherwise create it using the following command.

Unix> hadoop fs –mkdir –p /test-nifi/CDC

Make sure all the processors are running in NiFi:

Testing CDC

Run a bunch of insert statements on MySQL database.

mysql –u test_cdc –p 

At the MySQL CLI, run the following inserts:

insert into cdc_test values (3, ‘cdc3’, null,
 null);

insert into cdc_test values (4, ‘cdc3’, null,
 null);

insert into cdc_test values (5, ‘cdc3’, null,
 null);

insert into cdc_test values (6, ‘cdc3’, null,
 null);

insert into cdc_test values (7, ‘cdc3’, null,
 null);

insert into cdc_test values (8, ‘cdc3’, null,
 null);

insert into cdc_test values (9, ‘cdc3’, null,
 null);

insert into cdc_test values (10, ‘cdc3’, null,
 null);

insert into cdc_test values (11, ‘cdc3’, null,
 null);

insert into cdc_test values (12, ‘cdc3’, null,
 null);

insert into cdc_test values (13, ‘cdc3’, null,
 null);

Select * from cdc_test

Go to Hive using CLI and check if the records were transferred over using NiFi.

Hive> select * from hive_test_cdc

Voila…

Database Data (computing) Change data capture Apache NiFi

Published at DZone with permission of , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Scaling Salesforce Apps Using Heroku Microservices - Part 2
  • Apache Camel, the Powerful Client for Salesforce Data Capture
  • Using PostgreSQL pgoutput Plugin for Change Data Capture With Debezium
  • Fixing Common Oracle Database Problems

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!