Skip to main content

SQL Operators – PostgresOperator, MySqlOperator, SnowflakeOperator

Every data pipeline eventually reaches the same destination:

A database.

Whether it’s:

  • Loading transformed data
  • Running data quality checks
  • Executing analytics queries
  • Or triggering downstream systems

Airflow doesn’t store data β€”
it orchestrates how data moves and transforms.

That orchestration happens through SQL Operators.


What Are SQL Operators in Airflow?​

SQL Operators allow Airflow to execute SQL statements against databases in a controlled, observable, and retryable way.

They:

  • Use database hooks under the hood
  • Support templating and dynamic SQL
  • Integrate with Airflow Connections
  • Fail or succeed based on query execution

When Should You Use SQL Operators?​

Best Use Cases​

  • Data loading (INSERT, COPY, MERGE)
  • Data transformations inside the database
  • Data quality checks
  • Schema management
  • Stored procedure execution

When Not to Use Them​

  • Heavy business logic (use Python or Spark)
  • Row-by-row processing
  • Long-running analytical jobs better suited for warehouses

Common SQL Operators in Airflow​

OperatorDatabase
PostgresOperatorPostgreSQL
MySqlOperatorMySQL
SnowflakeOperatorSnowflake

All of them follow the same core pattern.


PostgresOperator Deep Dive​

Let’s start with PostgreSQL.

Basic Example​

from airflow import DAG
from airflow.providers.postgres.operators.postgres import PostgresOperator
from datetime import datetime

with DAG(
dag_id="postgres_operator_example",
start_date=datetime(2024, 1, 1),
schedule_interval="@daily",
catchup=False,
) as dag:

create_table = PostgresOperator(
task_id="create_sales_table",
postgres_conn_id="postgres_default",
sql="""
CREATE TABLE IF NOT EXISTS sales (
order_id INT,
amount NUMERIC,
created_at DATE
);
""",
)

Input​

ParameterValue
postgres_conn_idpostgres_default
sqlCREATE TABLE

Output​

Table sales created successfully

Using Templated SQL Files​

Instead of embedding SQL inline, use .sql files.

PostgresOperator(
task_id="load_sales_data",
postgres_conn_id="postgres_default",
sql="sql/load_sales_{{ ds }}.sql",
)

Input​

VariableValue
ds2024-01-10

Output​

Data loaded for 2024-01-10

This improves:

  • Readability
  • Version control
  • Reusability

MySqlOperator Example​

The pattern stays the same.

from airflow.providers.mysql.operators.mysql import MySqlOperator

MySqlOperator(
task_id="mysql_cleanup",
mysql_conn_id="mysql_reporting",
sql="""
DELETE FROM sessions
WHERE session_date < DATE_SUB(CURDATE(), INTERVAL 90 DAY);
""",
)

Input​

ParameterValue
mysql_conn_idmysql_reporting
retention90 days

Output​

Old records deleted successfully

SnowflakeOperator Deep Dive​

Snowflake is different β€” but Airflow makes it feel the same.

from airflow.providers.snowflake.operators.snowflake import SnowflakeOperator

SnowflakeOperator(
task_id="snowflake_transform",
snowflake_conn_id="snowflake_warehouse",
sql="""
MERGE INTO analytics.sales s
USING staging.sales_stg st
ON s.order_id = st.order_id
WHEN MATCHED THEN UPDATE SET amount = st.amount
WHEN NOT MATCHED THEN INSERT VALUES (
st.order_id, st.amount, CURRENT_DATE
);
""",
)

Input​

ParameterValue
snowflake_conn_idsnowflake_warehouse
sqlMERGE statement

Output​

Merge completed successfully

XComs & SQL Operators​

By default, SQL Operators:

  • Do not push query results to XCom

However, some operators support returning row counts or results depending on configuration.

⚠️ Best Practice:
Use SQL Operators for execution β€” not for extracting large datasets.


Transactions & Autocommit​

Most SQL operators support autocommit.

PostgresOperator(
task_id="autocommit_example",
postgres_conn_id="postgres_default",
sql="VACUUM;",
autocommit=True,
)

Use autocommit for:

  • DDL statements
  • Maintenance operations

SQL Operators vs PythonOperator + DB Hook​

ApproachBest For
SQL OperatorSimple, clean SQL execution
Python + HookDynamic logic, looping, conditionals

πŸ‘‰ Rule:
If it’s pure SQL β†’ use a SQL Operator.


Security & Credentials​

Best Practices​

  • Store credentials in Airflow Connections
  • Use least-privilege database roles
  • Avoid hardcoded schema names when possible
  • Parameterize SQL using templates

Common Mistakes​

❌ Embedding massive SQL inline
❌ Hardcoding credentials
❌ Using SQL Operators for data extraction
❌ Mixing orchestration logic inside SQL


Real-World Use Cases​

  • Data warehouse transformations
  • Daily aggregation jobs
  • Data quality checks
  • Slowly changing dimension updates
  • Schema migrations

Summary​

SQL Operators are the workhorses of data orchestration in Airflow.

Key Takeaways:

  • Simple, reliable SQL execution
  • Consistent patterns across databases
  • Deep integration with Airflow Connections
  • Best for in-database transformations

Used correctly, SQL Operators keep your pipelines efficient, readable, and scalable.


What’s Next?​

Next in the series:

➑️ File Operators – S3, GCS, LocalFilesystem