Things to Consider Before a Data Migration
Things to Consider Before a Data Migration
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Data migration has never been easier, as long as you plan well, pick up the right migration strategy, and execute it in right way and at the right time. In this article I’ll share my experience with data migration, migration strategies, conventional methods for executing data migration task and ETL (Extract Load Transform) / ELT (Extract Load Transform) tools for data migration.
In simple terms, Data Migration is the process of transferring data from one storage system to another: your storage system could be any database like Oracle, DB2, MySQL, IBM AS400 iSeries, POSTGRESQL, etc.
Compare data models
Before you migrate data, analyze the data models first. If your data models are pretty much same and you are transforming the data from one database to other database , if your data base vendor are same like let’s say Oracle to Oracle, then Dumping total database schema would suffice your requirement. If it is from one DB to the other (let’s say Oracle to MySQL) then you need to take care data types issues, data base specific issues.
Analyze data sets
Don’t jump start directly on data migration task. Analyze your data sets first. Now, let me explain personal experience first, We were doing a data migration from legacy AS400 system to Oracle data base. We classified entire data migration task into two, one is Master functions data migration. and the other is transactional data migration. Before you proceed with transactional data migration, you should be done with Master functions data migration. So, we carried out Master functions data migration first. Our Functional consultant given mapping sheet to AS400 team to pull the data from legacy AS400 IBM iSeries database by seeing the fields in our application. We had migrated data and later we found some issues with because they are some internal DB fields which are not filled with the data. Actually these fields will be fed up with data based on the application logic. This causes a rework, data to be pulled again from AS400 system and again we need to carry out the same task.
Here, Your application may be storing some information in hidden DB fields for some calculation purposes, and those may not be visible in your application. So don’t map your data fields from application to application,(AS400 application to eProduct (Oralce DB App)), rather you map with one Data base(AS400) table to the other Data base table.(Oracle)
Doing rework task is a bit tedious one, but of course, we need to. So analyze well right before to avoid rework in data migration.
Here is a tip to avoid rework, As we are migrating data from AS400 to Oracle, and the applications are maintained by different teams. So, each team will be knowing their respective data models. Sit together, have a scheduled meeting to discuss and map fields from AS400 database table to Oracle database table.
Selecting right data migration strategy.
Well, we are done with data analysis, but how to go about data migration? Selecting one from different data migration strategies so important because, it reflects in your time factor. Your datasets may be having millions of records, as Time is very precious than any other in the world, you should have a clear idea to select one from different data migration strategies.
There are two approaches basically, first one is the conventional method of writing PL/SQL packages (Procedures, Functions) to carry out data migration and second one the latest ETL/ ELT approach Data migration using ETL tools like Informatica Power Center, Data Stage, Oracle Data Integrator, Talend etc.,. Using tools will gracefully increase the productivity. Writing the conventional PL/SQL procedures is bit tedious task when compared to using ETL tools.There are tools out there in the market which could do a very complex data migrations in an easier way with better logging when ever the system fails.
Conventional PL/SQL Packages VS ETL /ELT Tools
To carry out data migration with PL/SQL packages. You need to be pretty much good at writing procedures and there is no better logging available for checking right data migration, you need to write your conventional procedures or functions to log them for checking right data migration.
Advantages of ETL tools
Retention of metadata. This is a major point because analytical applications are highly dependent on proper understanding of metadata.
Ease of use. Because most ETL tools are GUI based and have repositories, they have increased ease of use and ease of modification.
Built-ins. They have built in objects to handle recurring tasks such as Aggregation,Composition so these do not need to be coded and recoded.
Skill. Because of the above factors, the skill level requirements for ETL tools are less than with SQL.
Support. There is a large experience base to fall back on. This includes customer experiences and vendor support.
There is no such thing as free lunch, so ETL /ELT tools have some limitations.
Cost ETL tools are at a huge cost so, small and mid size companies can’t afford to buy.
Complex Transformations If you data migration involves complex transformations and complex calculations are involved then again you need to write code for that manually.
Performance as most of the operations are generic in nature some time performance may be slower when compared to PL/SQL.
If your data migration involves very complex transformations and complex calculation logic to be executed, and your company doesn’t afford to buy powerful ETL tool then go with PL/SQL. but, there are some Open source tools like Talend, Pentaho, Clover ETL tools will be worth considering.
Open Source ETL tools.
Some companies are as tough as they can be. Make huge amounts of money but, they think over when it comes to buying tools which would really improve the productivity of the company. Well, for those companies, Open Source ETL tools are a good option. ETL tools like Talend, Pentaho, Jasper and Clover ETL are with pretty good features are worth trying.
Talend an Open Source ETL comes with Sales Force (Cloud), ERPs and legacy AS400 connectivity as well. They are supporting around 20 different databases.
Real Time Scenario
When we are doing data migration for Master functions, as I said earlier, our company is also as tough as they can be. there is no tool available for data migration, I suggested this Open Source ETL, but they wanted so many peoples approval for single software installation that could improve the process, performance and productivity to company. It took around 2-3 weeks to get it approved, by that time our Master data migration is completed. If some of the operations are carried out by ETL tools I’m sure that we could have saved at least two weeks of time in the data migration.
The sequence of operations we did for doing migration for one Master function.
Getting Master functions data AS400 IBM iSeries data base in the form of Excel sheets,
with these excel sheets we have prepared CSV flat files,
created temporary tables in the oracle with similar structure
and loading CSV flat files data into oracle DB through PL/SQL developer (a front end tool for Oracle)
Writing procedure for loading data from Temporary table to actual Master function table.
Executing procedure and Unit Testing for loaded data.
If we could have used Talend ETL tool here, as it is capable of connecting to both AS400 and Oracle systems, the intermediate flat files wouldn’t have been appeared here. as an ETL tool comes with graphical interface, just we need to map the fields to get the migration done. In the above case, at least we should have used a tool to directly pull the data from the AS400 and place it in Oracle database temporary tables.
Think over when you give Estimations for Carrying out Data Migration task. (For Software Firms)
As I said earlier Data Migration has never been an easier task unless you plan well, make better use of tools available and execute it in a right way. If you are in a software company carrying out data migration tasks, think twice before giving estimations, analyze and account the complexity of data migration. For the transactional data migration that we carried out they had given an estimate of 20 hrs each for two resources. but, it took 20days even more. Things can be horrible if it is a billable task and having given estimate of 20hrs and took 20 days, even if you go again to customer with revised estimates he may not listen. An estimate cannot differ this way. So, be cautious when you are giving an estimate for this type of task. If you are carrying out this task for the first time, consult those people who carried out this task in the past.
Well, no body is perfect in the world, and the same applies to me. If you have any queries,amends and suggestions to this article feel free to comment
Published at DZone with permission of Umashankar Ankuri . See the original article here.
Opinions expressed by DZone contributors are their own.