Skip to main content

XComs & Data Passing Between Tasks in Apache Airflow

The Story Behind XComs πŸ“¦β€‹

Imagine an assembly line.

One worker extracts raw materials.
Another processes them.
A third prepares the final product.

But there’s a problem β€” how do they pass information to each other without shouting across the factory floor?

In Apache Airflow, that communication channel is called XCom β€” short for Cross-Communication.

XComs allow tasks to exchange small pieces of data, enabling workflows to be dynamic, intelligent, and interconnected rather than isolated steps.


What is an XCom?​

An XCom is a key–value pair stored in Airflow’s metadata database that allows one task to send data to another task.

Key Characteristics​

βœ… Task-to-task communication
βœ… Stored in Airflow metadata DB
βœ… Lightweight (small data only)
❌ Not designed for large datasets

Rule of Thumb:
XComs pass messages, not data lakes.


Why XComs Exist​

Without XComs:

  • Tasks are isolated
  • No dynamic behavior
  • No decision-making based on upstream results

With XComs:

  • Downstream tasks can react to upstream results
  • Branching and conditional logic becomes possible
  • Pipelines become intelligent, not static

Basic XCom Workflow​

  1. Task A generates a value
  2. Task A pushes the value to XCom
  3. Task B pulls the value from XCom
  4. Task B uses the value for processing

XCom Push & Pull – Classic Way​

Example Scenario​

  • Task 1 fetches today’s sales number
  • Task 2 calculates a bonus based on that number

DAG Example (PythonOperator)​

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def fetch_sales(****context):
sales = 1200
context['ti'].xcom_push(key='daily_sales', value=sales)

def calculate_bonus(****context):
sales = context['ti'].xcom_pull(
key='daily_sales',
task_ids='fetch_sales_task'
)
bonus = sales ** 0.10
print(f"Bonus Amount: {bonus}")

with DAG(
dag_id='xcom_basic_example',
start_date=datetime(2024, 1, 1),
schedule_interval=None,
catchup=False,
) as dag:

fetch_sales_task = PythonOperator(
task_id='fetch_sales_task',
python_callable=fetch_sales,
provide_context=True
)

calculate_bonus_task = PythonOperator(
task_id='calculate_bonus_task',
python_callable=calculate_bonus,
provide_context=True
)

fetch_sales_task >> calculate_bonus_task

Input & Output Example​

Input (Generated in Task 1)

{
"daily_sales": 1200
}

Output (Used in Task 2)

Bonus Amount: 120.0

Airflow 2+ introduced the TaskFlow API, which makes XComs almost invisible.

Why This Matters​

  • Cleaner code
  • No manual push/pull
  • Pythonic experience

TaskFlow Example​

from airflow.decorators import dag, task
from datetime import datetime

@dag(
dag_id="taskflow_xcom_example",
start_date=datetime(2024, 1, 1),
schedule_interval=None,
catchup=False,
)
def xcom_taskflow_dag():

@task
def fetch_sales():
return 1200

@task
def calculate_bonus(sales):
return sales ** 0.10

bonus = calculate_bonus(fetch_sales())

xcom_taskflow_dag()

Input & Output​

Input Returned by Task 1

1200

Output Consumed by Task 2

120.0

πŸ’‘ Behind the scenes, Airflow automatically stores and retrieves XComs.


Where Are XComs Stored?​

By default:

  • Stored in Airflow metadata database
  • Table: xcom

Each record contains:

  • DAG ID
  • Task ID
  • Execution date
  • Key
  • Value (serialized)

XCom Size Limitations βš οΈβ€‹

XComs are not designed for large payloads.

Best Practices​

βœ… IDs
βœ… Status flags
βœ… Filenames
βœ… API responses (small)

❌ Pandas DataFrames
❌ Large JSON blobs
❌ Images or files

πŸ‘‰ For large data:

  • Use S3 / GCS / HDFS
  • Pass only the file path via XCom

Custom XCom Backends (Advanced)​

Airflow allows:

  • Custom XCom storage
  • External systems (Redis, S3)

Useful for:

  • Large payloads
  • High-throughput systems

This is an advanced topic and usually required only at scale.


Common XCom Mistakes​

❌ Treating XCom as a data warehouse
❌ Forgetting task_ids when pulling
❌ Overusing XCom for large objects
❌ Tight coupling between tasks


When You SHOULD Use XComs​

βœ”οΈ Dynamic branching
βœ”οΈ Passing IDs or flags
βœ”οΈ API pagination tokens
βœ”οΈ TaskFlow return values


When You SHOULD NOT Use XComs​

✘ Moving datasets
✘ Sharing megabytes of data
✘ Replacing external storage


Summary πŸ§ β€‹

  • XComs enable task-to-task communication
  • They store small, meaningful data
  • TaskFlow API makes XCom usage seamless
  • Overuse or misuse can harm performance
  • Best practice: pass references, not data

Key Takeaways​

  • XCom = Cross-Communication
  • Stored in Airflow metadata DB
  • Lightweight by design
  • Essential for dynamic workflows
  • TaskFlow API is the modern standard

What’s Next?​

➑️ Branching Workflows – BranchPythonOperator
Learn how XComs power conditional execution and dynamic paths in DAGs.