DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Data Store Options for Operational Analytics/Data Engineering
  • Which Tool Is Better for Code Completion — Azure Data Studio or dbForge SQL Complete?
  • Microsoft Azure Backup Service
  • Amazon RDS vs Azure SQL — Know the Key Differentiators and Choose the Best

Trending

  • A Developer's Guide to Mastering Agentic AI: From Theory to Practice
  • Unlocking the Benefits of a Private API in AWS API Gateway
  • Breaking Bottlenecks: Applying the Theory of Constraints to Software Development
  • Unlocking AI Coding Assistants Part 3: Generating Diagrams, Open API Specs, And Test Data
  1. DZone
  2. Data Engineering
  3. Databases
  4. A Deep Dive Into CDC With Azure Data Factory

A Deep Dive Into CDC With Azure Data Factory

Change Data Capture (CDC) in SQL Server is a powerful feature designed to track and capture changes made to data within a database.

By 
Oriol Sr user avatar
Oriol Sr
·
Mar. 22, 24 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
27.9K Views

Join the DZone community and get the full member experience.

Join For Free

Change Data Capture (CDC) in SQL Server is a powerful feature designed to track and capture changes made to data within a database. It provides a reliable and efficient way to identify alterations to tables, allowing for the extraction of valuable insights into data modifications over time. By enabling CDC with Azure Data Factory, SQL Server enables a systematic and automated approach to monitoring and capturing changes, facilitating better data management, auditing, and analysis within the database environment.

Most Common Use-Cases: CDC With Azure Data Factory

Common scenarios where the CDC with Azure Data Factory proves beneficial include:

  • Audit trail and analytics: Monitoring data alterations for audit trails and conducting analytical assessments on change data.
  • Downstream propagation: Efficiently propagating changes to downstream subscribers for synchronized data updates.
  • ETL operations: Facilitating Extract, Transform, Load (ETL) operations to seamlessly transfer data changes from the Online Transaction Processing (OLTP) system to a data lake or data warehouse. Tools like Azure Data Factory can be employed for this purpose.
  • Event-driven programming: Enabling event-based programming for instantaneous responses triggered by data changes, enhancing real-time system interactions.

Usage: Some Queries

Here are SQL queries and commands for managing Change Data Capture (CDC) in SQL Server:

  • Check if CDC is enabled for the database:

Select  name, is_cdc_enabled from sys.databases;

  • Check which tables have CDC enabled::

Select  name, is_tracked_by_cdc from sys.tables;

  • First, the database needs to be enabled:

EXEC sys.sp_cdc_enable_db

  • Then enable all the tables to be audited:
SQL
 
EXECUTE sys.sp_cdc_enable_table

        @source_schema = N’dbo’,

        @source_name = N’PslMaterials’,

        @role_name     = NULL;


  • To disable the database:
    • EXEC sys.sp_cdc_disable_db
  • To disable a table:
SQL
 
EXEC sys.sp_cdc_disable_table

    @source_schema = N’dbo’,

    @source_name   = N’MyTable’,

    @capture_instance = N’dbo_MyTable’


When CDC is enabled for a database, a dedicated schema named CDC is established. Within this schema, several essential tables are created to manage and store change data. It’s crucial to note that disabling CDC for a table or the entire database can lead to the removal of these tables, resulting in the loss of historical changes. To preserve this historical data, it is necessary to copy the changes to another table or file.

CDC Schema

The key tables within the CDC schema include:

  • cdc.change_tables: the list of tables with CDC enabled
  • cdc.captured_columns: the list of captured columns for each table
  • cdc.ddl_history: Documents Data Definition Language (DDL) statements that modify the source tables. These changes aren’t immediately applied to CDC tables; a restart of the CDC instance is required for the changes to take effect.
  • cdc.index_columns: Defines the primary key of CDC tables.
  • cdc.lsn_time_mapping: Manages long block sequence number time mapping.

Additionally, when a table is enabled for CDC, two more tables are created:

  • cdc.cdc_jobs: Handles CDC-related jobs.
  • cdc.SchemaName_TableName_CT: Represents the change table for a specific schema and table, for instance, dbo_PslVendors_CT.

Mirrors all fields from the original table with some extra columns needed for CDC: 

  • __$start_lsn: Binary code that keeps track of when changes were committed, helping maintain the order in which changes occurred.
  • __$seqval: Another binary code used to organize changes to a row within a transaction.
  • __$operation: A number indicating the type of change made to the data. 1 represents a deletion, 2 is for insertion, and 3 and 4 are for updates (capturing column values before and after the update).
  • __$update_mask: A series of bits indicating which columns were changed during an update.
  • <captured source table columns>: The remaining columns represent the specific data captured during the creation of the capture instance. If no columns were specified, all columns from the source table are included.

CDC Implementation Details

  • Every source table enabled for the CDC has its dedicated CDC table.
  • Ensure sufficient database space to accommodate the additional tables generated, preventing potential space shortages.
  • The SQL Server Agent capture job retrieves changes from the transaction log and incorporates them into the corresponding change tables.
  • Cleanup jobs manage the change tables, adhering to a retention policy to remove outdated data.
  • Query functions provide a means to access and utilize change data from the CDC change tables.
  • In Azure SQL databases, where SQL Server Agent is unavailable, the CDC scheduler assumes the role of capturing and cleaning up data.

Performance Considerations: Factors Impacting Performance

  • Number of CDC-enabled tables: The more tables enabled for CDC, the higher the processing overhead. Evaluate necessity against performance impact.
  • Frequency of changes in tracked tables: Tables undergoing frequent changes increase the volume of captured data. Regularly changing data may impact performance.
  • Space availability in the source database: CDC captures changes and stores them. Ensure adequate space in the source database to accommodate change tables without risking space shortages.

CDC With Azure Data Factory

In Azure cloud, Data Factory is a powerful tool for various needs, and now includes a preview for Change Data Capture (CDC), which simplifies the process, offering the seamless power of CDC. Let’s explore the steps to leverage this feature:

Steps To Create CDC in the Data Factory

1. Let’s Create a CDC

CDC can be executed as a standalone resource, eliminating the need for a pipeline as it is needed for example for running Data flows.
factory resources

2. Assign a Name to the Resource (It Must Be Alphanumeric)

Choose the source type, ranging from various types of databases to files. In the case of the Azure SQL database, select the tables. CDC-enabled tables are automatically detected; otherwise, specify a field-defining row modification (typically a modified date field).

choose your sources

3. Choose the Destination

In this case, the same as the origin types: databases and also some storage where to store the files with the changes.

choose your targets

4. Define the Destination

The destination table will be created automatically with the Auto map option selected. Choose a key for the destination table.

define the destination

5. Define a Latency Among the Given Options

Real-time, 15-minute, 30-minute, 1 hour, 2 hours. Initiate the process, and the agent will read data at defined intervals.

6. Monitor

The green dots signify the instances when CDC was executed, occurring every 15 minutes in this example. The blue dots represent the captured changes during each execution, providing a clear monitoring interface.

Monitor

Conclusion

CDC stands out as a robust and influential tool, offering valuable capabilities for tracking and managing changes in databases. With the advent of the CDC with Azure Data Factory, this power is seamlessly harnessed in a user-friendly and practical manner. The combination of CDC and Data Factory presents an efficient and accessible solution for implementing Change Data Capture with utmost satisfaction.

Change data capture Database azure sql

Published at DZone with permission of Oriol Sr. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Data Store Options for Operational Analytics/Data Engineering
  • Which Tool Is Better for Code Completion — Azure Data Studio or dbForge SQL Complete?
  • Microsoft Azure Backup Service
  • Amazon RDS vs Azure SQL — Know the Key Differentiators and Choose the Best

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!