In this story, I would like to raise a discussion about how we transform data. Whether it's a database, data warehouse, or reporting solution, we execute data transformations based on data models, but how do we organize them? I'd like to talk about the modern data transformation tools you use. We will touch on some nuances of the modular approach, programming and data transformation testing. At the end of this article, I will provide an example application to run data modeling tasks with data lineage and self-documenting features. I really want to know what you think about it.
I witnessed dozens of different ways to execute data transformations. Throughout my fifteen plus years of career in big data and analytics, I have built data pipelines with different design patterns and I am sure there are more. That's why I like the world of technology so much. The multitude of possibilities it offers is simply amazing.
What operating system do you use for your data warehouse?
Modern data transformation tools
Modern data transformation tools, also known as data modeling tools or data warehouse operating systems (DWH), are designed to simplify SQL data manipulation tasks to create data sets, views, and tables. They often use a SQL-like dialect to execute possible data definitions (DDL) and manipulations (DML) that we may need, including data transformation testing and creating custom data sets in development mode.
The abundance of ANSI-SQL data storage solutions on the market makes these tools extremely useful. For example, consider this list of dbt adapters below. All the market leaders are present there.
dbt stands for database creation tool and is essentially a programming application that can be run locally or on the server to execute data transformation tasks. For example, consider this simple model below. It creates a view in our database and we can materialize it, say every 5 minutes, to preserve the data for analysis. At the top of the file we have…