Mastering airflow variables | Towards data science

How you retrieve variables from Airflow can affect the performance of your DAGs

What if multiple data pipelines need to interact with the same API endpoint? Would I really have to declare this endpoint in every pipeline? In case this endpoint changes in the near future, you will need to update its value in each file.

Airflow variables are simple but valuable constructs used to avoid redundant declarations in multiple DAGs. They are simply objects consisting of a JSON serializable key and value, stored in the Airflow metadata database.

What if your code uses tokens or other types of secrets? Encoding them in plain text does not seem to be a safe approach. Beyond reducing repetition, Airflow variables also help manage sensitive information. With six different ways to define variables in Airflow, selecting the appropriate method is crucial to ensure security and portability.

An often overlooked aspect is the impact variable recovery has on airflow performance. You can potentially overload the metadata database with requests every time the Scheduler parses DAG files (default is thirty seconds).

It's pretty easy to fall into this trap unless you understand how DAGs are parsed by the Scheduler and how variables are retrieved from the database.

Before getting into the discussion about how metastore variables are obtained and what best practices to apply to optimize DAGs, it is important to understand the basic concepts well. For now, let's focus on how we can declare variables in Airflow.

As already mentioned, there are several different ways to declare variables in Airflow. Some of them happen to be more secure and portable than others, so let's examine them all and try to understand their advantages and disadvantages.

1. Create a variable from the user interface

In this first approach, we will create a variable through the user interface. In the top menu select Admin → Variables → +

Mastering airflow variables | Towards data science

Technical Terrence Team

US to announce billions in subsidies for advanced chips: WSJ By Reuters

Leave a Reply Cancel reply

Recommended.

AT&T finally has a network testing program

Facebook will remove its News tab and stop paying publishers for news

7 examples of artificial intelligence in everyday life

JP Morgan event in focus amid biotech M&A and obesity frenzy

Bitcoin faces critical support amid selling pressure from its long-term holders: analyst

Categories

Important Links

Mastering airflow variables | Towards data science

How you retrieve variables from Airflow can affect the performance of your DAGs

1. Create a variable from the user interface

Related

Technical Terrence Team

US to announce billions in subsidies for advanced chips: WSJ By Reuters

Leave a Reply Cancel reply

Recommended.

AT&T finally has a network testing program

Facebook will remove its News tab and stop paying publishers for news

7 examples of artificial intelligence in everyday life

JP Morgan event in focus amid biotech M&A and obesity frenzy

Bitcoin faces critical support amid selling pressure from its long-term holders: analyst

Categories

Important Links

Get daily news updates to your inbox!