Skip to main content

Sensors – Poking vs Rescheduling, ExternalTaskSensor

Imagine this scenario:

Your pipeline depends on another system β€” maybe a **file upload, a **database update, or another DAG finishing.

You don’t want your DAG to fail or run prematurely.
You just want it to wait patiently and efficiently.

This is where Sensors come in.


What Are Sensors in Airflow?​

Sensors are specialized operators that:

  • Wait for a condition to be true
  • Can pause DAG execution until an event occurs
  • Integrate with Airflow scheduling and retries
  • Can be efficient in resource usage when configured correctly

Examples of common conditions:

  • File exists in S3 or local filesystem
  • Database row or table is available
  • Another DAG has completed successfully

Poking vs Rescheduling Modes​

Poking Mode​

  • Default behavior for most sensors
  • Continuously checks the condition at a fixed interval (poke_interval)
  • Keeps the task instance running
  • Can consume worker slots if the wait is long
from airflow.sensors.filesystem import FileSensor
from datetime import datetime

with DAG(dag_id="poke_sensor_example", start_date=datetime(2024,1,1), schedule_interval="@daily") as dag:
wait_for_file = FileSensor(
task_id="wait_for_file",
filepath="/data/input/sales_{{ ds }}.csv",
poke_interval=60,
timeout=3600, # max wait 1 hour
mode="poke",
)

Rescheduling Mode​

  • More efficient for long waits
  • Releases the worker slot between checks
  • Reduces resource consumption
  • Ideal for cloud or multi-task environments
wait_for_file_reschedule = FileSensor(
task_id="wait_for_file_reschedule",
filepath="/data/input/sales_{{ ds }}.csv",
poke_interval=60,
timeout=3600,
mode="reschedule",
)

ExternalTaskSensor​

Sometimes your DAG must wait for another DAG or task to complete.

from airflow.sensors.external_task import ExternalTaskSensor

wait_for_dag = ExternalTaskSensor(
task_id="wait_for_daily_sales_dag",
external_dag_id="daily_sales_pipeline",
external_task_id="load_sales_table",
allowed_states=["success"],
failed_states=["failed", "skipped"],
poke_interval=300,
timeout=7200,
)

Input​

ParameterValue
external_dag_iddaily_sales_pipeline
external_task_idload_sales_table
poke_interval300 sec

Output​

External DAG daily_sales_pipeline/load_sales_table completed successfully

Sensor Best Practices​

  • Use reschedule mode for long waits
  • Set timeout to avoid endless tasks
  • Limit poke_interval to balance responsiveness and resource usage
  • Combine with SLAs for monitoring

❌ Avoid​

  • Poking sensors with very short intervals on long waits
  • Waiting for unavailable or unreliable resources
  • Ignoring sensor failures or retries
  • Using sensors for heavy computation

Common Mistakes​

❌ Using Poke mode for hours-long waits
❌ Not handling failure states in ExternalTaskSensor
❌ Overloading workers with too many sensors
❌ Forgetting to parameterize file paths or DAG IDs


Real-World Use Cases​

  • Wait for ETL files to arrive before processing
  • Trigger downstream DAG only after upstream DAG completes
  • Poll external APIs or databases for data availability
  • Synchronize cross-system workflows

Summary​

Sensors are the gatekeepers of your DAGs:

Key Takeaways:

  • Wait for conditions without manual intervention
  • Choose poke for short waits, reschedule for long waits
  • ExternalTaskSensor ensures DAG dependencies are respected
  • Best practices reduce wasted resources and improve reliability

Properly implemented, sensors make your Airflow workflows robust, efficient, and event-driven.


What’s Next?​

Next in the series:

➑️ Hooks Explained – Database, S3, GCP, Azure