DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Data Engineering
  3. Big Data
  4. What Is Data Transformation?

What Is Data Transformation?

A look at data transformation and how it helps developers convert data from one format or structure into another.

Garrett Alley user avatar by
Garrett Alley
·
Nov. 30, 18 · Opinion
Like (7)
Save
Tweet
Share
23.98K Views

Join the DZone community and get the full member experience.

Join For Free

Data Transformation Defined

Data transformation is the process of converting data from one format or structure into another format or structure. Data transformation is critical to activities such as data integration and data management. Data transformation can include a range of activities: you might convert data types, cleanse data by removing nulls or duplicate data, enrich the data, or perform aggregations, depending on the needs of your project.

Typically, the process involves two stages.

In the first stage, you:

  • Perform data discovery where you identify the sources and data types.
  • Determine the structure and data transformations that need to occur.
  • Perform data mapping to define how individual fields are mapped, modified, joined, filtered, and aggregated.

In the second stage, you:

  • Extract data from the original source. The range of sources can vary, including structured sources, like databases, or streaming sources, such as telemetry from connected devices, or log files from customers using your web applications.
  • Perform transformations. You transform the data, such as aggregating sales data or converting date formats, editing text strings, or joining rows and columns.
  • Send the data to the target store. The target might be a database or a data warehouse that handles structured and unstructured data.

Why Transform Data?

You might want to transform your data for a number of reasons. Generally, businesses want to transform data to make it compatible with other data, move it to another system, join it with other data, or aggregate information in the data.

For example, consider the following scenario: your company has purchased a smaller company, and you need to combine information for the Human Resources departments. The purchased company uses a different database than the parent company, so you'll need to do some work to ensure that these records match. Each of the new employees has been issued an employee ID, so this can serve as a key. But, you'll need to change the formatting for the dates, you'll need to remove any duplicate rows, and you'll have to ensure that there are no null values for the Employee ID field so that all employees are accounted for. All these critical functions are performed in a staging area before you load the data to the final target.

Other common reasons to transform data include:

  • You are moving your data to a new data store; for example, you are moving to a cloud data warehouse and you need to change the data types.
  • You want to join unstructured data or streaming data with structured data so you can analyze the data together.
  • You want to add information to your data to enrich it, such as performing lookups, adding geolocation data, or adding timestamps.
  • You want to perform aggregations, such as comparing sales data from different regions or totaling sales from different regions.

How Is Data Transformed?

There are a few different ways to transform data:

  • Scripting. Some companies perform data transformation via scripts using SQL or Python to write the code to extract and transform the data.
  • On-premise ETL tools. ETL (Extract, Transform, Load) tools can take much of the pain out of scripting the transformations by automating the process. These tools are typically hosted on your company's site, and may require extensive expertise and infrastructure costs.
  • Cloud-based ETL tools. These ETL tools are hosted in the cloud, where you can leverage the expertise and infrastructure of the vendor.

Data Transformation Challenges

Data transformation can be difficult for a number of reasons:

  • Time-consuming. You may need to extensively cleanse the data so you can transform or migrate it. This can be extremely time-consuming, and is a common complaint amongst data scientists working with unstructured data.

  • Costly. Depending on your infrastructure, transforming your data may require a team of experts and substantial infrastructure costs.

  • Slow. Because the process of extracting and transforming data can be a burden on your system, it is often done in batches, which means you may have to wait up to 24 hours for the next batch to be processed. This can cost you time in making business decisions.

Data science Data transformation Database

Published at DZone with permission of Garrett Alley, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Stress Testing Tutorial: Comprehensive Guide With Best Practices
  • Microservices 101: Transactional Outbox and Inbox
  • Public Key and Private Key Pairs: Know the Technical Difference
  • How To Use Java Event Listeners in Selenium WebDriver

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: