DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. A Modern Approach to Data Migration

A Modern Approach to Data Migration

The ETL portion of most data projects takes up as much as 70% of the overall time.

Garrett Alley user avatar by
Garrett Alley
·
Dec. 05, 18 · Opinion
Like (3)
Save
Tweet
Share
7.85K Views

Join the DZone community and get the full member experience.

Join For Free

Many data migration projects seem easy at first. Just take the data from your database and put it into a data warehouse. Simple! But once you start, you realize that there's so much more to it than that. You likely have more data than you thought, in both volume and types of sources. And then getting that data all matched up and into a common schema so it's useful is no easy task. Add in security concerns, resource constraints, and scalability to really see the larger picture. In the end, we're finding that for the companies we've spoken with, the ETL portion of most data projects was taking up as much as 70% of the overall time.

ETL takes up 70% of data project time.

That means there's a lot of room for improvement, for a modern approach.

Old vs. New: A Fresh Look at Data Pipelines

We shouldn't settle for "at least it works most of the time" or even for "good enough." Today's goal should be to spend less time in the ETL part of the process so that you have better data, more data, available sooner.

Having access to better data can have a material impact on your business and even change its culture. Better data leads to:

  • Better decision making
  • Less arguing (if you know your data is correct and up to date, misinterpretation is less likely, for example)
  • More confidence in decisions and direction

Data Pipelines

The question is, how do we get there? What are some ways we can optimize the ETL portion of data migration?

Data Sources: Legacy and Modern

To help streamline the process, let's first take a look at the types of data sources involved in today's data migration projects.

On the one hand, we have legacy data sources. These are complex, often on-premise databases and systems, with limited data accessibility. Often these systems are being sunsetted and that can even be part of the reason for the migration project.

Example legacy data sources:

  • Netezza
  • Teradata
  • Oracle Exadata

And then there are modern data sources. These include streaming data, applications, SaaS services, modern databases, files, etc. If you perform a data census as part of your data migration plan (which we recommend), you may be surprised at the number of these data sources your company has.

Example modern data sources:

  • Salesforce
  • MySQL
  • MongoDB
  • S3/Azure Blob Storage

Now that we have a good understanding of the data sources involved, we see that they don't have a common set of challenges, and there likely isn't a single approach that will work with all of them. The key is treating the two types of data sources differently. We can make substantial improvements to the ETL process based on some helpful strategies for dealing with each type of data source.

Strategies for Migrating Different Data Sources

Legacy and modern data sources have different inherent challenges, and knowing which approach to use for each is key.

Legacy Data Sources

Challenge Modern Approach Traditional Approach
Migrating to a cloud data warehouse Phased migration All at once migration (time consuming)
Data Modeling Use your cloud data warehouse to denormalize the data Normalize in stage, then migrate (time consuming)
Migrating schemas Automatically map A to B Manually map A to B

Modern Data Sources

Challenge Modern Approach Traditional Approach
Capture all the changes Reduce overhead by only capturing changes Copy over all the data, every time (time consuming)
Handling semi-structured data sources Leverage semi-structured data types in your modern cloud data warehouse Stringify and extract (very error prone)
Migrating schemas Automatically map A to B Manually map A to B

The winning strategy is to spend less time on the ETL phase of your data migration project. To do that, pick a tool designed to handle the different data sources properly.

Data (computing) Data migration

Published at DZone with permission of Garrett Alley, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • What Should You Know About Graph Database’s Scalability?
  • Using the PostgreSQL Pager With MariaDB Xpand
  • How Do the Docker Client and Docker Servers Work?
  • Top Five Tools for AI-based Test Automation

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: