DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Data Engineering
  3. Big Data
  4. What Is ETL and ETL Testing?

What Is ETL and ETL Testing?

The volume, velocity, and variety of data are making the integration process with ETL and ETL testing more tedious and complex.

Walter Lindsay user avatar by
Walter Lindsay
·
Mar. 23, 17 · Opinion
Like (4)
Save
Tweet
Share
9.25K Views

Join the DZone community and get the full member experience.

Join For Free

Extract-Transform-Load (ETL) is a data integration concept that describes how data is transferred or loaded from its source to the data warehouse. While the first iterations of the ETL process are considered to be a thing of the past with the rise of Big Data analytics, these iterations, together with data warehouses and the Business Intelligence (BI) that they deliver, have evolved and are widely used in practice by businesses today. For businesses with stringent financial reporting and audit-intensive requirements, data warehouses and ETL still provide a well-modeled and structured solution compared with emerging solutions such as Hadoop.

To better understand why ETL is still widely used today, let us take a closer look at both ETL and ETL testing.

What Is ETL?

Obtaining Business Intelligence (BI) — the meaningful insights that help enterprises make important decisions — is the primary objective of enterprise systems. BI is derived from the data that enterprises possess such as daily transactions and correspondences with customers, suppliers, and other stakeholders. However, for this data to be useful, it must first be transformed into information that is easily accessible and consumable by BI applications and tools, as well as end-users. Only then can the information be analyzed to create true Business Intelligence.

The usual process of generating Business Intelligence involves a series of steps:

  1. Daily transactions and correspondences are recorded.
  2. Records are collected in databases.
  3. Data are processed and transformed into usable information.
  4. Information is analyzed to generate insight.

The fourth step alone (the generation of Business Intelligence) consumes a significant part of a system’s capacity. That’s why enterprises have found it beneficial to separate the transaction workload from the analysis workload. This is where ETL comes in.

ETL is a process that simplifies the first three steps. From its abbreviation, it “extracts” data from the multiple and disparate source systems such as records databases, “transforms” this data into usable information for decision makers, and “loads” the data into data warehouses, from which end-users can readily extract usable data for query and analysis. It is also important to note here that the transform process also standardizes the data collected from the databases into formats or schemas readable by the data warehouses.

ETL enables enterprises to effectively separate the transaction workload from the analysis workload by utilizing data warehouses. Instead of end-users going directly to source databases, adding to the transaction workload, and processing the extracted data, they can gather already-processed information from data warehouses. ETL also helps end-users avoid disrupting the transaction processes itself.

How Does ETL Work?

To illustrate how ETL works, imagine an enterprise with several departments such as marketing and finance. Each of the departments uses data such as customer information differently. For example, finance may be more concerned about the transactions made by customers while marketing is more concerned about the customers’ demographics. Thus, each department stores different attributes in their systems. Finance may also have recorded each transaction by customer name, while marketing may have stored customer information by customer ID.

Suppose the marketing end-users want to check the transaction history of a customer. In that case, they cannot easily gain access to this information as it is stored in a separate database, the one used by finance. In this system, gathering information for analysis is tedious and may be disruptive of other processes.

ETL solves these problems by extracting data from all the departments, processing or transforming the data into a standard and usable format, and storing it into a single data warehouse from which all end-users have access. By utilizing ETL and data warehouses, marketing in our example can easily extract necessary data for analysis without disrupting the finance department.

What Is ETL Testing?

ETL testing is the process of verifying whether or not the ETL process is working smoothly. That is, does the data maintains its integrity and accuracy after being extracted, transformed, and loaded from the source to the data warehouse? The objective of ETL testing is to maintain a high level of confidence among end-users in the data stored in the data warehouse.

Because the ETL process involves a number of steps, it also needs to be tested in several ways. These correspond to the different types of ETL testing.

Accuracy Testing

This is the most common form of ETL Testing. It checks whether the data is accurately transformed and loaded from the source to the data warehouse. It gives an overview of the integrity of the entire ETL process.

Completeness Testing

This verifies whether all the data from the source are loaded into the data warehouse.

Data Validation Testing

This assesses whether the values of the data post-transformation are the same as their expected values with respect to the source values.

Software Testing

This verifies whether data values extracted from a new application or repository are the same as those of old applications and repositories.

Metadata Testing

This checks whether data retains its integrity up to the metadata level — that is, its length, indexes, constraints, and type.

Syntax Testing

This checks for poor data due to invalid characters, erroneous character patterns, and incorrect character cases. Together with reference testing, it forms part of an overall data quality test.

Reference Testing

This checks the correctness of the inputs in relation to the required attributes. For example, it checks whether a Customer ID contains only numeric values and does not have any null values.

Interface Testing

This reviews the integrity of data in the end-user interface. It also checks the quality and accuracy of data in front-end navigation and reports.

Performance Testing

This assesses the load capacity of the ETL process. Often designed as a stress test, it measures the performance of the ETL process when handling multiple users and transactions. It is almost always immediately followed by performance tuning, which minimizes bottlenecks for optimal performance. Bottlenecks can be found throughout the entire ETL process, from the source to the data warehouse.

Benefits of Utilizing ETL and Data Warehouses

The benefits of the ETL process and utilizing data warehouses go beyond those of other data integration tools and technologies.

Aside from significantly improving integration, the greatest benefit of using ETL and data warehouses is achieving faster response time. ETL and, more specifically, the utilization of data warehouses allow the transaction and the analysis processes to work independently. This enables enterprises to achieve greater efficiency both at the source and the data warehouse leading to faster transactions processing and also faster and better querying and analysis.

The second big benefit of utilizing ETL is improving overall data quality. The three-step process of extracting, transforming, and loading enables ETL testers to review the correctness of data in each step. As a result, ETL testers can identify and solve data errors where they occur — in the source, in the data warehouse, or during the transformation process.

Finally, utilizing ETL also promotes greater organizational and operational efficiency. The ETL process ensures that changes made to source data, regardless of where the changes are initiated, will be reflected in the data warehouse. This allows different departments of enterprises to implement their own ad hoc software or systems while being assured that the data they use reflect the changes made by other departments. This empowers them to take initiatives that will benefit their departments while moving the entire organization forward.

Go Beyond Integration

The emergence of cloud services and Big Data is creating a challenge for data integration tools, including the ETL process. The volume, velocity, and variety of data, coupled with the different integration requirements brought upon by cloud services, are making the integration process more tedious and complex.

Extract, transform, load Big data

Published at DZone with permission of Walter Lindsay, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 10 Most Popular Frameworks for Building RESTful APIs
  • Use AWS Controllers for Kubernetes To Deploy a Serverless Data Processing Solution With SQS, Lambda, and DynamoDB
  • Java REST API Frameworks
  • Java Concurrency: LockSupport

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: