Airflow Variables & Connections
So far, youβve created DAGs, set task dependencies, and scheduled your workflows. β
But what if your DAG needs credentials, file paths, or API keys?
Hardcoding them is risky and inflexible. Airflow solves this problem with:
- Variables β dynamic parameters
- Connections β secure credentials
Real-World Story: Configuration as a Serviceβ
Imagine running multiple environments:
- Dev, Test, Production
- Same DAG code, different database credentials
- Hardcoding these would be messy
Airflow lets you store configuration centrally and reuse it safely.
1οΈβ£ Airflow Variablesβ
What are Variables?β
Variables are key-value pairs stored in Airflow metadata.
Use them for:
- File paths
- API endpoints
- Threshold values
- Feature toggles
Example: Create a Variable in UIβ
| Key | Value |
|---|---|
| s3_bucket | my-data-bucket |
| api_key | 12345-ABCDE |
Accessing Variables in DAGβ
from airflow.models import Variable
s3_bucket = Variable.get("s3_bucket")
api_key = Variable.get("api_key", default_var="default-key")
print(s3_bucket) # Output: my-data-bucket
print(api_key) # Output: 12345-ABCDE
Input & Output Exampleβ
Inputβ
- Variable key: s3_bucket
- Stored value: my-data-bucket
Outputβ
my-data-bucket
β Variables can also store JSON:
config = Variable.get("my_config", deserialize_json=True)
print(config["threshold"])
2οΈβ£ Airflow Connectionsβ
What are Connections?β
Connections store credentials and endpoints for:
- Databases (Postgres, MySQL, Redshift)
- Cloud providers (AWS, GCP, Azure)
- APIs (HTTP, FTP)
This avoids hardcoding sensitive information in DAGs.
Example: Define a Connection in UIβ
| Connection ID | Type | Host | Login | Password |
|---|---|---|---|---|
| my_postgres | Postgres | localhost | user | pass123 |
Accessing a Connection in DAGβ
from airflow.hooks.base import BaseHook
conn = BaseHook.get_connection("my_postgres")
print(conn.host) # Output: localhost
print(conn.login) # Output: user
print(conn.password) # Output: pass123
Input & Output Exampleβ
Inputβ
- Connection ID: my_postgres
- Host: localhost
- Login: user
- Password: pass123
Outputβ
Host: localhost
Login: user
Password: pass123
Using Variables and Connections in Tasksβ
from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.models import Variable
from airflow.hooks.base import BaseHook
from datetime import datetime
def print_config():
bucket = Variable.get("s3_bucket")
conn = BaseHook.get_connection("my_postgres")
print(f"S3 Bucket: {bucket}, DB Host: {conn.host}")
with DAG(
dag_id="variables_connections_dag",
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
catchup=False,
tags=["config", "airflow"],
) as dag:
task = PythonOperator(
task_id="print_config_task",
python_callable=print_config
)
β Output in logs:
S3 Bucket: my-data-bucket, DB Host: localhost
Best Practices (Professional)β
β
Store secrets in Connections, not Variables
β
Use JSON Variables for structured configs
β
Avoid hardcoding credentials in DAGs
β
Leverage Environment Variables for extra security
β
Name Variables and Connections consistently
Common Mistakesβ
β Hardcoding passwords in DAGs
β Forgetting to handle missing variables (default_var)
β Using Variables for sensitive credentials
β Changing Connection IDs without updating DAGs
SEO Key Takeawaysβ
- Variables store dynamic parameters
- Connections store secure credentials
- Access them in Python tasks via Variable.get and BaseHook.get_connection
- Proper use improves security and maintainability
Summaryβ
In this chapter, you learned:
- Difference between Variables and Connections
- How to create, retrieve, and use Variables
- How to create, retrieve, and use Connections
- Best practices for secure and maintainable configuration
π― Your DAGs are now configurable, secure, and production-ready.
Whatβs Next?β
π Templating & Jinja Expressions in Airflow
Learn how to make DAGs dynamic using:
- Jinja templating
- Macros
- Runtime parameters