ETL vs. ELT: Differences Explained
ETL vs. ELT: Differences Explained
ETL and ELT look very similar, but work in fairly different ways. Ever wondered what the difference is? Read on to learn more.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
If your company has a data warehouse, you are likely using ETL (Extract, Transform, Load) or ELT ( Extract, Load, Transform) to get your data from different sources into your data warehouse. These are common methods for moving volumes of data and integrating the data so that you can correlate information from different sources, store the data securely in a single place, and enable members of your company from different departments to view the data.
The difference between the two terms has to do with the order in which these processes take place. Each of these methods is well-suited to different situations. Let's look a little closer.
What Is ETL?
ETL is a process that involves extracting data from disparate sources (sometimes using a staging table) and transforming it, performing such actions as changing the data types or applying calculations. Then, after the data is transformed, it is loaded into the target database, which is typically a data warehouse. When you perform ETL, the ETL software does the bulk of the heavy lifting — the transforming.
ETL is often used in the following cases:
- The source and target database are different and use different data types.
- The volumes of data are moderate or small.
- The transformations are compute-intensive.
- The data is structured.
What Is ELT?
ELT is a process that involves extracting the data, loading it into the target warehouse, and then transforming it after it is loaded. In this case, the work of transforming the data is completed by the target database. ELT typically occurs with NoSQL databases like Hadoop clusters or cloud installations.
ELT is often used in the following cases:
- The source and target database are the same type (i.e. Oracle source and target).
- There are large volumes of data.
- The target database engine is well-adapted to handling large volumes of data.
- The data is unstructured.
Which One Is Better?
These descriptions might leave you wondering which approach is better. The truth is each of these methods has advantages over the other in different circumstances, and the best solution depends on your situation.
The following table contrasts some of the key differences.
|Maturity||ETL has been around for 20 years and is specifically designed to work with relational databases, structured and unstructured data, and very large data volume. There are numerous experts and best practices to guide you in using ETL. And there are many ETL tools to choose from.||ELT is not as well-adapted as ETL because it is not specifically designed to work with relational databases, which have dominated the market for the last 20 years.|
|Flexibility||Older ETL tools are well-suited to relational databases, but often less geared for unstructured data. In addition, using ETL tools, you must map out the data that will be moved to the target database, and any changes to that plan require the mapping to be restructured, and all of the data loaded again.||ELT tools can often handle a combination of structured and unstructured data. In addition, ELT tools usually move all of your data to the target so the resulting data set is more flexible.|
|Hardware requirements||Many traditional ETL tools require specific hardware and have their own engines to perform transformations on the data. Though modern ETL platforms run in the cloud.||ELT tools take advantage of the compute power of existing hardware to perform transformations on the data.|
|Better for…||• Structured data
• Smaller volumes of data and complex computations
• On-premise relational databases
|• Unstructured data
• Large volumes of data and less complex computations
• Cloud environments
• Data lake
An ETLT Approach
Looking at this list of pros and cons, you likely see features in both camps that are appealing to you and relevant to the problems you want to solve. But, what if you have both structured and unstructured data? What if you want to perform complex computations and you want to load all your source data efficiently to your target warehouse? What if you want to use both ETL and ELT.
For example, you may want to enrich the data using timestamps or geolocations in the pipeline, but then, after the load, enable the target database to perform complex transforms that may require database-heavy tasks such as joins, which it does efficiently.
Consider a scenario where you want to join source data to existing target data in the data warehouse. With an ETLT (Extract, Transform, Load, Transform) approach, you don't need to pull the data out of the existing target database just so you can join it to the data you will load. The resulting pipeline is one that just makes sense: allow each of your tools to do the thing that they do best for an efficient, flexible solution.
Published at DZone with permission of Garrett Alley , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.