ETL (Extract-Transform-Load) and ELT (Extract-Load-Transform) are two terms commonly used in the field of data engineering and, more specifically, in the context of data ingestion and transformation.
While these terms are often used interchangeably, they refer to slightly different concepts and have different implications for designing a data pipeline.
In this post, we will clarify the definitions of ETL and ELT processes, describe the differences between the two, and discuss the advantages and disadvantages that both have to offer to data engineers and teams in general.
And most importantly, I'm going to describe how recent changes in the formation of modern data teams have impacted the landscape around the battle between ETL and ELT.
The main thing at stake when it comes to comparing ETL and ELT is obviously the sequence in which the extract, load and transform steps are executed within a data pipeline.
For now, let's ignore this execution sequence and focus on the actual terminology and discuss what each individual step is supposed to do.
Extract: This step refers to the process of extracting data from a persistent source. This data source could be a database, API endpoint, file, or anything that contains any type of data, including structured or unstructured.
Transform: In this step, the pipeline is expected to make some changes to the data structure or format to achieve a certain goal. A transformation could be a selection of attributes, a modification of records (for example, transform 'United Kingdom'
in 'UK'
), a data validation, a join to another source, or really anything that changes the format of the input raw data.
Burden: The loading step refers to the process of copying the data (either the raw or transformed version) to the target system…