Scaling Workers & Horizontal Autoscaling in Apache Airflow

Boosting Performance with Worker Scaling 🚀

The Story: The Growing Workflow Problem

Imagine you have a DAG with hundreds of tasks, but only a single worker is running. As your DAG runs, tasks start to queue up, slowing down your entire pipeline. You need to speed up execution to handle the increasing load, but how?

Airflow gives you the ability to scale workers horizontally and implement autoscaling to automatically add or remove worker nodes based on demand.

This is key to keeping your workflows running efficiently, even as they grow in complexity.

What is Horizontal Autoscaling in Airflow?

Horizontal scaling refers to adding more worker nodes to handle more tasks. Autoscaling takes this a step further by automatically adjusting the number of worker nodes based on the workload.

By enabling horizontal autoscaling, you can:

Ensure that Airflow dynamically responds to task demand.
Add workers when tasks are queuing up or remove them when demand drops.
Improve overall system performance and resource utilization.

Why is Horizontal Autoscaling Important?

Without proper scaling:

Tasks queue up as more resources are needed.
Workers are underutilized during low traffic periods.
Airflow may struggle with large workflows or many concurrent tasks.

With horizontal autoscaling:

Tasks run faster as more workers are available.
Airflow can adapt to fluctuating workloads.
You can keep your environment cost-efficient, using fewer resources during off-peak times.

Setting Up Horizontal Scaling in Airflow

1. Using CeleryExecutor with Autoscaling

When you use the CeleryExecutor, you can scale workers horizontally by adding more worker nodes. To automate scaling, you can use Celery's autoscaling feature.

First, ensure you’ve set up Celery with a message broker (e.g., Redis or RabbitMQ).

Then, in your Celery worker configuration, enable autoscaling:

 celery worker -A airflow.executors.celery_executor --autoscale=10,3

Here:

10 is the maximum number of workers.
3 is the minimum number of workers.

This means that Airflow will automatically scale between 3 and 10 workers based on the workload.

2. Using KubernetesExecutor with Autoscaling

For the KubernetesExecutor, scaling is more automated because Kubernetes handles the scaling of pods. You just need to configure the number of pods and resources needed.

In your airflow.cfg:

 [kubernetes]
 worker_pod_template_file = /path/to/worker-template.yaml

Then, define the pod template for autoscaling. Example Kubernetes YAML configuration:

 apiVersion: apps/v1
 kind: Deployment
 metadata:
 name: airflow-worker
 spec:
 replicas: 3 # Start with 3 replicas
 template:
 spec:
 containers:
 - name: airflow-worker
 image: apache/airflow:latest
 resources:
 requests:
 cpu: "500m"
 memory: "1Gi"
 limits:
 cpu: "1000m"
 memory: "2Gi"

You can adjust the number of replicas dynamically based on the workload.

3. Leveraging Autoscaling with Amazon ECS or Google Cloud

In cloud environments, you can use managed services like Amazon ECS or Google Cloud Kubernetes to take care of autoscaling for you. These services allow you to configure autoscaling groups or pods that will automatically scale based on demand.

Example: Scaling with CeleryExecutor

Let’s say you have a DAG with many tasks and you want to horizontally scale using the CeleryExecutor.

In the airflow.cfg, configure CeleryExecutor:

 [core]
 executor = CeleryExecutor

[celery]
 broker_url = redis://localhost:6379/0
 result_backend = db+postgresql://user:password@localhost/airflow

Then, run your worker with autoscaling enabled:

 celery worker -A airflow.executors.celery_executor --autoscale=10,3

This will ensure that your Celery workers automatically scale based on the workload.

Performance Impact of Autoscaling

Scenario	Without Autoscaling	With Autoscaling
Task Execution Speed	Slower, tasks queued up	Faster, tasks processed in parallel
Worker Utilization	Low utilization during off-peak	Dynamic resource allocation
Cost Efficiency	Fixed cost regardless of demand	Cost-effective, scaling only as needed

Practical Example: Horizontal Scaling with KubernetesExecutor

If you're using Kubernetes, you can leverage KubernetesExecutor to scale pods dynamically based on the number of tasks in the DAG.

In your airflow.cfg, set the executor to KubernetesExecutor:

 [core]
 executor = KubernetesExecutor

Define resource requests and limits in the pod template (e.g., worker-template.yaml):

 apiVersion: v1
 kind: Pod
 metadata:
 labels:
 component: worker
 spec:
 containers:
 - name: airflow-worker
 image: apache/airflow:latest
 resources:
 requests:
 cpu: 500m
 memory: 1Gi
 limits:
 cpu: 1000m
 memory: 2Gi

This setup allows Kubernetes to dynamically scale the number of pods based on workload, adding or removing them as needed.

Key Considerations for Autoscaling

1. Resource Requests and Limits

Always define appropriate resource requests and limits for tasks to ensure efficient scaling. For Kubernetes, this means specifying CPU and memory for each pod to avoid overloading your infrastructure.

2. Message Broker (For CeleryExecutor)

Ensure your message broker (Redis or RabbitMQ) is highly available and can handle the traffic generated by autoscaling.

3. Monitoring

Monitor your autoscaling setup to ensure that scaling events happen smoothly. Airflow’s metrics can help you track worker utilization, task performance, and scaling decisions.

Best Practices for Scaling Workers

✅ Use horizontal scaling when your tasks become too large for a single machine.
✅ Leverage KubernetesExecutor for automatic pod scaling in cloud-native environments.
✅ Configure autoscaling for Celery workers using the --autoscale flag.
✅ Monitor worker utilization to prevent over-provisioning or under-provisioning.
✅ Adjust resource limits based on the workload to avoid resource bottlenecks.

Summary 🧠

Horizontal scaling allows you to distribute tasks across multiple workers.
Autoscaling helps automatically adjust the number of workers based on workload demand.
Both CeleryExecutor and KubernetesExecutor provide powerful scaling capabilities, but Kubernetes offers more dynamic, cloud-native scaling options.
Monitoring and resource limits are key to successful scaling.

Key Takeaways

Horizontal scaling ensures that Airflow can handle large workflows.
Autoscaling helps dynamically scale workers based on real-time task load.
Use CeleryExecutor or KubernetesExecutor to horizontally scale, depending on your environment.
Proper configuration of resource requests and monitoring is essential for efficient scaling.

What’s Next?

➡️ Performance Tuning – Pools, Priority Weights, Parallelism
Learn how to optimize Airflow’s performance with effective task scheduling and resource management.

Boosting Performance with Worker Scaling 🚀​

The Story: The Growing Workflow Problem​

What is Horizontal Autoscaling in Airflow?​

Why is Horizontal Autoscaling Important?​

Setting Up Horizontal Scaling in Airflow​

1. Using CeleryExecutor with Autoscaling​

2. Using KubernetesExecutor with Autoscaling​

3. Leveraging Autoscaling with Amazon ECS or Google Cloud​

Example: Scaling with CeleryExecutor​

Performance Impact of Autoscaling​

Practical Example: Horizontal Scaling with KubernetesExecutor​

Key Considerations for Autoscaling​

1. Resource Requests and Limits​

2. Message Broker (For CeleryExecutor)​

3. Monitoring​

Best Practices for Scaling Workers​

Summary 🧠​

Key Takeaways​

What’s Next?​