Skip to main content

Creating Your First DAG in Apache Airflow

Imagine you are running a data factory. Every morning, raw data arrives, gets cleaned, transformed, and finally delivered to analytics teams. You don’t manually trigger each step β€” Apache Airflow does it for you.

At the heart of Airflow lies one powerful concept:

The DAG (Directed Acyclic Graph)

In this chapter, you’ll build your **first Airflow DAG, understand how it works, and learn production-ready best practices used by real data engineering teams.


What is a DAG in Airflow?​

A DAG is:

  • A collection of tasks
  • Organized in a specific order
  • With no circular dependencies
  • Executed on a schedule or event

πŸ“Œ Simple definition: A DAG is a blueprint that tells Airflow what to run, when to run, and in what order.


Real-World Story: Why DAGs Exist​

Think of making coffee β˜•:

  1. Grind beans
  2. Boil water
  3. Brew coffee

You can’t brew before boiling water. You shouldn’t grind beans again after brewing.

This logical flow is exactly how DAGs work.


Basic Structure of an Airflow DAG​

Every DAG file is a Python script and usually contains:

  1. DAG definition
  2. default_args
  3. Schedule
  4. Tasks (operators)
  5. Dependencies

Creating Your First DAG (Step-by-Step)​

Step 1: Import Required Modules​

from airflow import DAG
from airflow.operators.empty import EmptyOperator
from datetime import datetime

Step 2: Define default_args​

default_args are shared settings applied to all tasks in the DAG.

default_args = {
"owner": "data_engineering_team",
"retries": 1,
}

Common default_args Explained​

ArgumentMeaning
ownerWho owns the DAG
retriesNumber of retry attempts
retry_delayDelay between retries
email_on_failureNotify on failure

Step 3: Create the DAG Object​

with DAG(
dag_id="my_first_dag",
description="My first Airflow DAG",
default_args=default_args,
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
catchup=False,
tags=["beginner", "airflow"],
) as dag:

Key Parameters Explained​

ParameterDescription
dag_idUnique DAG name
start_dateWhen scheduling starts
schedule_intervalHow often it runs
catchupPrevents backfills
tagsHelps UI organization

Step 4: Add Tasks​

start = EmptyOperator(task_id="start")
end = EmptyOperator(task_id="end")

πŸ“Œ EmptyOperator is perfect for:

  • Start markers
  • End markers
  • Logical grouping

Step 5: Define Task Dependencies​

start >> end

This means:

Start must complete before End runs


Full Example DAG (Complete Code)​

from airflow import DAG
from airflow.operators.empty import EmptyOperator
from datetime import datetime

default_args = {
"owner": "data_engineering_team",
"retries": 1,
}

with DAG(
dag_id="my_first_dag",
description="My first Airflow DAG",
default_args=default_args,
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
catchup=False,
tags=["beginner", "airflow"],
) as dag:

start = EmptyOperator(task_id="start")
end = EmptyOperator(task_id="end")

start >> end

Example Input & Output​

Input​

  • DAG scheduled at 12:00 AM daily
  • No external data required

Output​

  • Task start β†’ SUCCESS
  • Task end β†’ SUCCESS
  • DAG run marked green in Airflow UI

How This Appears in the Airflow UI​

  • DAG Name: my_first_dag
  • Schedule: Daily
  • Tasks: start β†’ end
  • Status: Running / Success / Failed

πŸ“Š Visual Graph View makes dependencies instantly clear.


Best Practices (Production-Grade)​

βœ… Always set catchup=False for beginners
βœ… Use meaningful dag_id names
βœ… Add tags for discoverability
βœ… Keep DAGs idempotent
βœ… Avoid heavy logic inside DAG files


Common Beginner Mistakes​

❌ Forgetting start_date
❌ Using dynamic datetime.now()
❌ Circular dependencies
❌ Overloading DAG files with business logic


Key Takeaways​

  • DAGs are the backbone of Apache Airflow
  • Every DAG is a Python file
  • default_args reduce repetition
  • Scheduling controls automation
  • Dependencies define execution order

Summary​

In this chapter, you learned:

  • What a DAG is and why it exists
  • How to create your first DAG
  • Understanding default_args
  • Scheduling fundamentals
  • Task dependencies
  • Best practices used by professionals

🎯 You’ve officially created your first Airflow DAG!


What’s Next?​

πŸ‘‰ Operators Basics Learn how real tasks actually do work using:

  • PythonOperator
  • BashOperator
  • EmptyOperator