HttpOperator & REST API Workflows
Imagine this scenario:
Your business needs **daily weather data, **social media metrics, or stock prices.
You donβt have a database table β only REST APIs.
Airflow doesnβt store external APIs β it orchestrates their calls, handles retries, and ensures observability.
Enter the HttpOperator.
What Is HttpOperator?β
The HttpOperator allows Airflow to send HTTP requests (GET, POST, PUT, DELETE) as part of a DAG.
Key features:
- Supports GET/POST methods
- Handles authentication (Basic, Bearer, or custom)
- Allows JSON payloads and headers
- Integrates with XCom for downstream tasks
- Retries on failure
When Should You Use HttpOperator?β
Best Use Casesβ
- Fetching data from REST APIs
- Sending POST requests for webhook notifications
- Triggering external ETL pipelines
- Integrating with SaaS platforms (e.g., Salesforce, HubSpot)
- API-based monitoring or alerting
When Not to Use Itβ
- Bulk data extraction β prefer dedicated connectors
- Complex data transformations β combine with PythonOperator
- Real-time streaming β Airflow is batch-oriented
Basic GET Request Exampleβ
from airflow import DAG
from airflow.providers.http.operators.http import SimpleHttpOperator
from datetime import datetime
with DAG(
dag_id="httpoperator_get_example",
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
catchup=False,
) as dag:
get_weather = SimpleHttpOperator(
task_id="get_weather_data",
http_conn_id="weather_api",
endpoint="v1/current?city=London",
method="GET",
response_filter=lambda response: response.json(),
log_response=True,
)
Inputβ
| Parameter | Value |
|---|---|
| http_conn_id | weather_api |
| endpoint | v1/current?city=London |
| method | GET |
Output (XCom)β
{
"city": "London",
"temperature": 8,
"condition": "Cloudy"
}
POST Request with JSON Payloadβ
post_user_data = SimpleHttpOperator(
task_id="post_user_data",
http_conn_id="user_api",
endpoint="users/create",
method="POST",
data='{"name": "Alice", "email": "alice@example.com"}',
headers={"Content-Type": "application/json"},
response_filter=lambda response: response.json(),
)
Inputβ
| Parameter | Value |
|---|---|
| data | JSON payload with user info |
| headers | Content-Type: application/json |
Output (XCom)β
{
"status": "success",
"user_id": 1023
}
Authentication in HttpOperatorβ
Using Airflow Connectionsβ
- Basic Auth
- Username & password stored in Airflow connection
- HttpOperator automatically handles authentication
- Bearer Token
- Store token in Airflow connection
- Pass header in the operator:
headers={"Authorization": "Bearer {{ conn.my_api.password }}"}
Error Handling & Retriesβ
HttpOperator supports:
- Automatic retries for network errors
- Custom timeout parameters
- Logging of response codes and bodies
SimpleHttpOperator(
task_id="retry_example",
http_conn_id="api_conn",
endpoint="data",
method="GET",
retries=3,
retry_delay=timedelta(minutes=5),
)
Outputβ
- Logs response code
- Retries on failure
- Alerts on ultimate failure
Templating with Endpoints & Payloadsβ
HttpOperator fully supports Jinja templating:
endpoint="v1/data?date={{ ds }}"
data='{"run_date": "{{ ds }}"}'
This allows dynamic API calls per DAG run.
HttpOperator vs PythonOperator + Requestsβ
| Feature | HttpOperator | PythonOperator + Requests |
|---|---|---|
| Built-in logging | β | Manual |
| Connection management | β | Manual |
| Templating | β | Manual |
| Retries | β | Manual |
| XCom support | β | Custom |
Rule of Thumb:
Use HttpOperator for simple API calls; PythonOperator if you need complex logic or looping.
Security Best Practicesβ
β Recommendedβ
- Store API keys in Airflow Connections
- Use HTTPS endpoints only
- Limit access tokens' permissions
- Avoid exposing secrets in logs
β Avoidβ
- Hardcoding secrets in DAG files
- Using HTTP for sensitive data
- Ignoring API rate limits
Common Mistakesβ
β Pulling large datasets directly with HttpOperator
β Not handling network timeouts
β Forgetting retries
β Mixing API logic with heavy data transformations
Real-World Use Casesβ
- Fetching weather or financial data daily
- Triggering webhooks for notifications
- Integrating with cloud SaaS apps
- Monitoring external services
- API-based feature flag evaluation
Summaryβ
HttpOperator is the bridge between Airflow and external services.
Key Takeaways:
- Sends GET, POST, and other HTTP requests reliably
- Handles authentication, headers, payloads
- Supports retries, templating, and XCom
- Best used for orchestrating API calls, not heavy data processing
When used correctly, it makes Airflow a powerful tool for API-driven workflows.
Whatβs Next?β
Next in the series:
β‘οΈ Sensors β Poking vs Rescheduling, ExternalTaskSensor