What Is ELT?
Though it might sound like a sandwich, ELT is actually a super helpful data protocol that can be used in data cleansing, enriching data, and more.
Join the DZone community and get the full member experience.Join For Free
ELT stands for Extract, Load, Transform. ELT is an evolution of the traditional system where you would extract, transform, and then load the data (ETL). Historically, ETL has been the best and most reliable way to move data from one place to another. But, as modern data storage systems have grown their compute power, sometimes it's more efficient to load the data before transforming.
It's not a one-size-fits-all situation either, as some transformations are better performed in the data store, and some are better performed in the data pipeline. We'll talk about this more, later.
For a detailed comparison between the two methods for moving data, see ETL vs ELT: Differences Explained.
Benefits of ELT
Here are a few of benefits of ELT:
- Efficient. ELT can take advantage of the compute power of existing hardware to perform transformations.
- Flexible resulting data set. When you use ELT, you move the entire data set to the target. This can be useful if you don't want to transform the data before you move it, or you want flexibility in the schema for the target data.
And here are some common use cases that benefit from ELT:
- The data is relatively simple yet massive, such as log files and sensor data. In this case, the transformations that take place in the target might be relatively simple, and the benefit comes from the ability of the target data store to load massive volumes of data quickly.
- The data is unstructured, and it doesn't require extensive initial transformations because you plan to use machine learning tools or data mining for analysis instead of standard structured queries like SQL. When you perform ELT for this use case, data analysts define their schemas using "schema on read" — meaning the schema is developed after the data is written to the target store. Traditional ETL uses "schema on write," in which the schema is defined as a part of the ETL process before data is written to the target data store. The benefit in this use case is that you don't need to plan the schema ahead of time, and you can leverage the target data store's ability to move large volumes of unstructured data.
When You Might Prefer to Use ETL
While ELT can be great for certain situations, there are still cases in which an ETL tool is your best bet. Modern ETL tools might be the best choice for the following situations:
- When you want to do extensive data cleansing before loading to the target store. ETL is a better solution for this because you don't move the unwanted data to the target.
- When you want to perform complex computations. Traditionally, ETL tools are more efficient at this than data warehouses or data lakes.
- When you are working with only structured data or traditionally structured data warehouses. ETL tools are generally the most efficient at moving structured data from one environment to another.
- When you want to enrich data. If you want to enrich data as it is moved to the target store, you'll want to use an ETL tool. For example, you may want to add geolocation information or timestamps.
Published at DZone with permission of Garrett Alley, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.