Scaling Workers & Horizontal Autoscaling in Apache Airflow
Boosting Performance with Worker Scaling 🚀
The Story: The Growing Workflow Problem
Imagine you have a DAG with hundreds of tasks, but only a single worker is running. As your DAG runs, tasks start to queue up, slowing down your entire pipeline. You need to speed up execution to handle the increasing load, but how?
Airflow gives you the ability to scale workers horizontally and implement autoscaling to automatically add or remove worker nodes based on demand.
This is key to keeping your workflows running efficiently, even as they grow in complexity.
What is Horizontal Autoscaling in Airflow?
Horizontal scaling refers to adding more worker nodes to handle more tasks. Autoscaling takes this a step further by automatically adjusting the number of worker nodes based on the workload.
By enabling horizontal autoscaling, you can:
- Ensure that Airflow dynamically responds to task demand.
- Add workers when tasks are queuing up or remove them when demand drops.
- Improve overall system performance and resource utilization.
Why is Horizontal Autoscaling Important?
Without proper scaling:
- Tasks queue up as more resources are needed.
- Workers are underutilized during low traffic periods.
- Airflow may struggle with large workflows or many concurrent tasks.
With horizontal autoscaling:
- Tasks run faster as more workers are available.
- Airflow can adapt to fluctuating workloads.
- You can keep your environment cost-efficient, using fewer resources during off-peak times.
Setting Up Horizontal Scaling in Airflow
1. Using CeleryExecutor with Autoscaling
When you use the CeleryExecutor, you can scale workers horizontally by adding more worker nodes. To automate scaling, you can use Celery's autoscaling feature.
First, ensure you’ve set up Celery with a message broker (e.g., Redis or RabbitMQ).
Then, in your Celery worker configuration, enable autoscaling:
celery worker -A airflow.executors.celery_executor --autoscale=10,3
Here:
10is the maximum number of workers.3is the minimum number of workers.
This means that Airflow will automatically scale between 3 and 10 workers based on the workload.
2. Using KubernetesExecutor with Autoscaling
For the KubernetesExecutor, scaling is more automated because Kubernetes handles the scaling of pods. You just need to configure the number of pods and resources needed.
In your airflow.cfg:
[kubernetes]
worker_pod_template_file = /path/to/worker-template.yaml
Then, define the pod template for autoscaling. Example Kubernetes YAML configuration:
apiVersion: apps/v1
kind: Deployment
metadata:
name: airflow-worker
spec:
replicas: 3 # Start with 3 replicas
template:
spec:
containers:
- name: airflow-worker
image: apache/airflow:latest
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
You can adjust the number of replicas dynamically based on the workload.
3. Leveraging Autoscaling with Amazon ECS or Google Cloud
In cloud environments, you can use managed services like Amazon ECS or Google Cloud Kubernetes to take care of autoscaling for you. These services allow you to configure autoscaling groups or pods that will automatically scale based on demand.
Example: Scaling with CeleryExecutor
Let’s say you have a DAG with many tasks and you want to horizontally scale using the CeleryExecutor.
In the airflow.cfg, configure CeleryExecutor:
[core]
executor = CeleryExecutor
[celery]
broker_url = redis://localhost:6379/0
result_backend = db+postgresql://user:password@localhost/airflow
Then, run your worker with autoscaling enabled:
celery worker -A airflow.executors.celery_executor --autoscale=10,3
This will ensure that your Celery workers automatically scale based on the workload.
Performance Impact of Autoscaling
| Scenario | Without Autoscaling | With Autoscaling |
|---|---|---|
| Task Execution Speed | Slower, tasks queued up | Faster, tasks processed in parallel |
| Worker Utilization | Low utilization during off-peak | Dynamic resource allocation |
| Cost Efficiency | Fixed cost regardless of demand | Cost-effective, scaling only as needed |
Practical Example: Horizontal Scaling with KubernetesExecutor
If you're using Kubernetes, you can leverage KubernetesExecutor to scale pods dynamically based on the number of tasks in the DAG.
In your airflow.cfg, set the executor to KubernetesExecutor:
[core]
executor = KubernetesExecutor
Define resource requests and limits in the pod template (e.g., worker-template.yaml):
apiVersion: v1
kind: Pod
metadata:
labels:
component: worker
spec:
containers:
- name: airflow-worker
image: apache/airflow:latest
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 1000m
memory: 2Gi
This setup allows Kubernetes to dynamically scale the number of pods based on workload, adding or removing them as needed.
Key Considerations for Autoscaling
1. Resource Requests and Limits
Always define appropriate resource requests and limits for tasks to ensure efficient scaling. For Kubernetes, this means specifying CPU and memory for each pod to avoid overloading your infrastructure.
2. Message Broker (For CeleryExecutor)
Ensure your message broker (Redis or RabbitMQ) is highly available and can handle the traffic generated by autoscaling.
3. Monitoring
Monitor your autoscaling setup to ensure that scaling events happen smoothly. Airflow’s metrics can help you track worker utilization, task performance, and scaling decisions.
Best Practices for Scaling Workers
✅ Use horizontal scaling when your tasks become too large for a single machine.
✅ Leverage KubernetesExecutor for automatic pod scaling in cloud-native environments.
✅ Configure autoscaling for Celery workers using the --autoscale flag.
✅ Monitor worker utilization to prevent over-provisioning or under-provisioning.
✅ Adjust resource limits based on the workload to avoid resource bottlenecks.
Summary 🧠
- Horizontal scaling allows you to distribute tasks across multiple workers.
- Autoscaling helps automatically adjust the number of workers based on workload demand.
- Both CeleryExecutor and KubernetesExecutor provide powerful scaling capabilities, but Kubernetes offers more dynamic, cloud-native scaling options.
- Monitoring and resource limits are key to successful scaling.
Key Takeaways
- Horizontal scaling ensures that Airflow can handle large workflows.
- Autoscaling helps dynamically scale workers based on real-time task load.
- Use CeleryExecutor or KubernetesExecutor to horizontally scale, depending on your environment.
- Proper configuration of resource requests and monitoring is essential for efficient scaling.
What’s Next?
➡️ Performance Tuning – Pools, Priority Weights, Parallelism
Learn how to optimize Airflow’s performance with effective task scheduling and resource management.