DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Optimizing Data Management: Migrating From AWS RDS MySQL to Snowflake
  • Understanding RDS Costs
  • 7 Invaluable Advantages of Using Amazon RDS
  • Cloud Database Services Compared: AWS, Microsoft, Google, and Oracle

Trending

  • The Smart Way to Talk to Your Database: Why Hybrid API + NL2SQL Wins
  • Operational Principles, Architecture, Benefits, and Limitations of Artificial Intelligence Large Language Models
  • Infrastructure as Code (IaC) Beyond the Basics
  • How GitHub Copilot Helps You Write More Secure Code
  1. DZone
  2. Data Engineering
  3. Databases
  4. Relational DB Migration to S3 Data Lake Via AWS DMS, Part I

Relational DB Migration to S3 Data Lake Via AWS DMS, Part I

This article discusses the challenges faced during relational database migration to AWS using DMS, including source data, logging, and network bandwidth issues.

By 
Vijay Bhosale user avatar
Vijay Bhosale
·
Feb. 07, 25 · Analysis
Likes (3)
Comment
Save
Tweet
Share
4.1K Views

Join the DZone community and get the full member experience.

Join For Free

AWS Database Migration Service is a cloud service that migrates relational databases, NoSQL databases, data warehouses, and all other types of data stores into AWS Cloud or between cloud and on-premises setups efficiently and securely. DMS supports several types of source and target databases such as Oracle, MS SQL Server, MySQL, Postgres SQL, Amazon Aurora, AWS RDS, Redshift, and S3, etc.

Observations During the Data Migration

We worked on designing and creating an AWS S3 data lake and data warehouse in AWS Redshift with the data sources from on-premises for Oracle, MS SQL Server, MySQL, Postgres SQL, and MongoDB for relational databases. We used AWS DMS for the initial full load and daily incremental data transfer from these sources into AWS S3. 

With this series of posts, I want to explain the various challenges faced during the actual data migration with different relational databases.

1. Modified Date Not Populated Properly at the Source

AWS DMS is used for full load and change data capture from source databases. AWS DMS captures changed records based on the transaction logs, but a modified date column updated properly can help to apply deduplication logic, and extract the latest modified record for a given row on the target in S3.

In case modified data is not available for a table or it is not updated properly, AWS DMS provides an option of transformation rules to add a new column while extracting data from the source database. Here, the AR_H_CHANGE_SEQ header helps to add a new column with value as a unique incrementing number from the source database, which consists of a timestamp and an auto-incrementing number.

The below code example adds a new column as DMS_CHANGE_SEQ to the target, which has a unique incrementing number from the source. This is a 35-digit unique number with the first 16 digits for the timestamp and the next 19 digits for the record ID number incremented by the database. 

JSON
 
{
    "rule-type": "transformation",
    "rule-id": "2",
    "rule-name": "2",
    "rule-target": "column",
    "object-locator": {
        "schema-name": "%",
        "table-name": "%"
    },
    "rule-action": "add-column",
    "value": "DMS_CHANGE_SEQ",
    "expression": "$AR_H_CHANGE_SEQ",
    "data-type": {
        "type": "string",
        "length": 100
    }
}


2. Enabling Supplemental Logging for Oracle as a Source

For Oracle as a source database, to capture ongoing changes, AWS DMS needs minimum supplemental logging to be enabled on the source database. Accordingly, this will include additional information and columns in the redo logs to identify the changes at the source.

Supplemental logging can be enabled for primary, unique keys, sets of columns, or all the columns. Supplemental logging for all columns captures all the columns for the tables in the source database and helps to overwrite the complete records in the target AWS S3 layer.

Supplemental logging of all columns will increase the redo logs size, as all the columns for the table are logged into the logs. One needs to configure, redo, and archive logs accordingly to consider additional information in them.

3. Network Bandwidth Between Source and Target Databases

Initial full load from the on-premises sources for Oracle, MS SQL Server, etc., worked fine and changed data capture, too, for most of the time. 

There used to be a moderate number of transactions most of the time of the day in a given month, except for the end-of-business-day process, daily, post-midnight, and month-end activities. We observed DMS migration tasks were out of sync or failed during this time.

We reviewed the source, target, and replication instance metrics in the logs and found the following observations:

  • CDCLatencySource – the gap, in seconds, between the last event captured from the source endpoint and the current system timestamp of the AWS DMS instance.
  • CDCIncomingchanges – the total number of change events at a point in time that is waiting to be applied to the target. This increases from zero to thousands during reconciliation activities in the early morning.
  • CDCLatencySource – the gap, in seconds, between the last event captured from the source endpoint and the current system timestamp of the AWS DMS instance. This increases from zero to a few thousand up to 10-12K seconds during daily post-midnight reconciliation activities. This value was up to 40K during month-end activities. 

Upon further logs analysis and reviewing other metrics, we observed that:

AWS DMS metrics NetworkReceiveThroughput is to understand the incoming traffic on the DMS Replication instance for both customer database and DMS traffic. These metrics help to understand the network-related issues, if any, between the source database and the DMS replication instance.

AWS DMS metrics NetworkReceiveThroughput

 

It was observed that the network receive throughput was up to 30MB/s, i.e., 250Mb/s, due to the VPN connection between the source and AWS, which was also shared for other applications.

The final conclusion to this issue is that connectivity between source and target databases is critical for successful data migration. You should ensure sufficient bandwidth between on-premises or other cloud source databases and the AWS environment is set up before the actual data migration.

  1. A VPN tunnel such as AWS Site-to-Site VPN or Oracle Cloud Infrastructure (OCI) Site-to-Site VPN (Oracle AWS) can provide a throughput of up to 1.25 Gbps. This would be sufficient for small tables migration or tables with less DML traffic migration.
  2. For large data migrations with heavy transactions per second on the tables, you should consider AWS Direct Connect. It provides an option to create a dedicated private connection with 1 Gbps, 10 Gbps, etc. bandwidth supported. 

Conclusion

This is Part I of the multi-part series for the relational databases migration challenges using AWS DMS and their solutions implemented. Most of these challenges mentioned in this series could happen during the database migration process and these solutions can be referred.

AWS Data lake Database Relational database

Opinions expressed by DZone contributors are their own.

Related

  • Optimizing Data Management: Migrating From AWS RDS MySQL to Snowflake
  • Understanding RDS Costs
  • 7 Invaluable Advantages of Using Amazon RDS
  • Cloud Database Services Compared: AWS, Microsoft, Google, and Oracle

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!