In this discussion, I aim to explore evolving trends in data orchestration and modeling, highlighting advancements in tools and their key benefits for data engineers. While Airflow has been the dominant player since 2014, the data engineering landscape has transformed significantly and now addresses more sophisticated use cases and requirements, including support for multiple programming languages, integrations, and improved scalability. I will examine contemporary and perhaps unconventional tools that streamline my data engineering processes, allowing me to effortlessly create, manage, and orchestrate robust, long-lasting, and scalable data pipelines.
Over the past decade we have witnessed a “Cambrian explosion” of various ETL frameworks for data extraction, transformation and orchestration. Not surprisingly, many of them are open source and based on Python.
The most popular:
- Airflow, 2014
- Luis, 2014
- Prefect, 2018
- Temporary, 2019
- Flotation, 2020
- Days, 2020
- Magician, 2021
- Orchestra, 2023