Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Common Data Migration Mistakes

DZone 's Guide to

Common Data Migration Mistakes

Data migration doesn't have to be complicated. Read on to get some great advice on the data migration process and leave the headaches for someone else.

· Big Data Zone ·
Free Resource

Data migration can seem like a simple task: move data from one place to another. How hard can it be? We start a project with the best of intentions, thinking, "What could possibly go wrong?"

Unfortunately, migrating data can be more complex than it looks, and many of the challenges are made up of the things we forgot to do or assumed we didn't need to do. Let's take a look at a few of the common issues that can trip you up when migrating data.

Waiting Until the Target Is Ready to Get Started

Often when migrating data, people wait until the target is ready to get started. But, this is a mistake because a large part of the work in migrating data involves the careful planning and scoping of the project. You'll need to gather requirements and agree on metrics for success. You'll also need to plan schema mapping, data mapping, backup and recovery plans, and security and go-live plans. Each of these steps takes considerable time. And once you've done all that planning, the work of cleansing and normalizing the data needs to happen to get the source data ready to be moved. If you wait until the target is ready to go, you may be behind schedule before you even start.

Surprises in Your Data

Part of your planning should include an assessment of your sources and their dependencies. You need to perform an inventory of all your data assets, and the associated applications, to find dependencies. Pay close attention to the upstream and downstream applications affected by your data migration. A complex project may have between 60 and 80 different data objects coming in from a hundred or so different applications. When you discover new source data or dependencies late in the game, it can throw off your migration timeline and add complexity to your project.

Skipping Data Cleansing

Sometimes when migrating data it seems easier to just move the data and clean it once it is moved to the target. But, the time to clean your data is before you move it. If you were moving to a new house, would you take the contents of your garbage can with you? Likely not. So why would you move bad data? If you move the data without cleansing it, you'll perpetuate the problems that existed in the source data.

Before you move your data, you should take the time to perform a data profile. A data profile is a thorough examination of your existing data. Profiling your data will help you to understand if there are blank or null values, if the data is unique or duplicated, or if the data patterns and values fall into a range you expect. After you perform a thorough data profile, you'll need to perform data mapping to plan how the source types will correlate to the source types in the target. Next, you'll cleanse and validate your data. This involves removing extraneous data, filling in missing data, normalizing data (making it conform to a pattern that is compatible with other data), and masking sensitive data. You may need to transform and enrich the data. Data transformation is the process of converting data from one format or structure into another format or structure. Some of these processes must be done before you extract the data, while others can be done after extracting the data but before loading it to the target. A flexible ETL tool can help ease some of the work in this process.

Not Hiring Experts

Often, the perception of a data migration project is that it is a "shift and lift" operation. This perception leads project leaders to skimp when hiring or assigning staff to the project. The process of migrating data takes an understanding of the complexities of data profiling, data cleansing, and security requirements, among other things. It is easy to underestimate just how complex and challenging data migrations can become, and spending less on experts can cost you in the long run. If you move bad data, or if you neglect security, you can end up with poor data quality or worse, a security breach. At the very least, it can take a long time for a newbie to ramp up, and your project can be severely delayed.

No Rollback Plan

Sometimes when you are migrating data, there is a lot of pressure to keep moving forward. And it might seem tempting to push your changes to the target and fix any issues after you have moved the data. But, a better way to handle this is to have a rollback plan for various stages of the project. This involves performing checks at various stages and having backups configured if you need to roll back changes. While this may seem more tedious, it will save you headaches down the road.

Topics:
big data ,data migration ,data migration best practices ,data cleansing ,data mapping

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}