Skip to main content

HttpOperator & REST API Workflows

Imagine this scenario:

Your business needs **daily weather data, **social media metrics, or stock prices.
You don’t have a database table β€” only REST APIs.

Airflow doesn’t store external APIs β€” it orchestrates their calls, handles retries, and ensures observability.

Enter the HttpOperator.


What Is HttpOperator?​

The HttpOperator allows Airflow to send HTTP requests (GET, POST, PUT, DELETE) as part of a DAG.

Key features:

  • Supports GET/POST methods
  • Handles authentication (Basic, Bearer, or custom)
  • Allows JSON payloads and headers
  • Integrates with XCom for downstream tasks
  • Retries on failure

When Should You Use HttpOperator?​

Best Use Cases​

  • Fetching data from REST APIs
  • Sending POST requests for webhook notifications
  • Triggering external ETL pipelines
  • Integrating with SaaS platforms (e.g., Salesforce, HubSpot)
  • API-based monitoring or alerting

When Not to Use It​

  • Bulk data extraction β€” prefer dedicated connectors
  • Complex data transformations β€” combine with PythonOperator
  • Real-time streaming β€” Airflow is batch-oriented

Basic GET Request Example​

from airflow import DAG
from airflow.providers.http.operators.http import SimpleHttpOperator
from datetime import datetime

with DAG(
dag_id="httpoperator_get_example",
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
catchup=False,
) as dag:

get_weather = SimpleHttpOperator(
task_id="get_weather_data",
http_conn_id="weather_api",
endpoint="v1/current?city=London",
method="GET",
response_filter=lambda response: response.json(),
log_response=True,
)

Input​

ParameterValue
http_conn_idweather_api
endpointv1/current?city=London
methodGET

Output (XCom)​

{
"city": "London",
"temperature": 8,
"condition": "Cloudy"
}

POST Request with JSON Payload​

post_user_data = SimpleHttpOperator(
task_id="post_user_data",
http_conn_id="user_api",
endpoint="users/create",
method="POST",
data='{"name": "Alice", "email": "alice@example.com"}',
headers={"Content-Type": "application/json"},
response_filter=lambda response: response.json(),
)

Input​

ParameterValue
dataJSON payload with user info
headersContent-Type: application/json

Output (XCom)​

{
"status": "success",
"user_id": 1023
}

Authentication in HttpOperator​

Using Airflow Connections​

  1. Basic Auth
  • Username & password stored in Airflow connection
  • HttpOperator automatically handles authentication
  1. Bearer Token
  • Store token in Airflow connection
  • Pass header in the operator:
headers={"Authorization": "Bearer {{ conn.my_api.password }}"}

Error Handling & Retries​

HttpOperator supports:

  • Automatic retries for network errors
  • Custom timeout parameters
  • Logging of response codes and bodies
SimpleHttpOperator(
task_id="retry_example",
http_conn_id="api_conn",
endpoint="data",
method="GET",
retries=3,
retry_delay=timedelta(minutes=5),
)

Output​

  • Logs response code
  • Retries on failure
  • Alerts on ultimate failure

Templating with Endpoints & Payloads​

HttpOperator fully supports Jinja templating:

endpoint="v1/data?date={{ ds }}"
data='{"run_date": "{{ ds }}"}'

This allows dynamic API calls per DAG run.


HttpOperator vs PythonOperator + Requests​

FeatureHttpOperatorPythonOperator + Requests
Built-in loggingβœ…Manual
Connection managementβœ…Manual
Templatingβœ…Manual
Retriesβœ…Manual
XCom supportβœ…Custom

Rule of Thumb:
Use HttpOperator for simple API calls; PythonOperator if you need complex logic or looping.


Security Best Practices​

  • Store API keys in Airflow Connections
  • Use HTTPS endpoints only
  • Limit access tokens' permissions
  • Avoid exposing secrets in logs

❌ Avoid​

  • Hardcoding secrets in DAG files
  • Using HTTP for sensitive data
  • Ignoring API rate limits

Common Mistakes​

❌ Pulling large datasets directly with HttpOperator
❌ Not handling network timeouts
❌ Forgetting retries
❌ Mixing API logic with heavy data transformations


Real-World Use Cases​

  • Fetching weather or financial data daily
  • Triggering webhooks for notifications
  • Integrating with cloud SaaS apps
  • Monitoring external services
  • API-based feature flag evaluation

Summary​

HttpOperator is the bridge between Airflow and external services.

Key Takeaways:

  • Sends GET, POST, and other HTTP requests reliably
  • Handles authentication, headers, payloads
  • Supports retries, templating, and XCom
  • Best used for orchestrating API calls, not heavy data processing

When used correctly, it makes Airflow a powerful tool for API-driven workflows.


What’s Next?​

Next in the series:

➑️ Sensors – Poking vs Rescheduling, ExternalTaskSensor