Snowflake with Python, PySpark, Databricks — Enterprise Integration
🎬 Story Time — “We Need All Our Tools Talking to Each Other”
Ritika, a lead data engineer at a fast-growing SaaS company, faces an integration challenge.
Her ecosystem is huge:
- Python notebooks for analysts
- PySpark pipelines on Databricks
- Machine learning workflows
- Batch + streaming
- Snowflake as the central data warehouse
The CTO declares:
“Everything must flow into Snowflake and out of Snowflake, seamlessly.”
Now Ritika must connect Python, PySpark, and Databricks in a clean, scalable architecture.
🧊 1. Snowflake + Python — Your Data Engineering Power Duo
Python integrates with Snowflake through:
- Snowflake Connector for Python
- Snowpark for Python
- Pandas + Snowflake Native Connectors
- Streamlit-in-Snowflake (SIS)
Ritika starts with the Python connector.
🔌 1.1 Snowflake Python Connector
Install:
pip install snowflake-connector-python
Connect:
import snowflake.connector
conn = snowflake.connector.connect(
user='RITIKA',
password='xxxxxxx',
account='AB12345.ap-south-1',
warehouse='ANALYTICS_WH',
database='SALES_DB',
schema='PUBLIC'
)
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM ORDERS")
print(cursor.fetchone())
This powers:
- ad hoc scripts
- ETL micro-jobs
- Python automations
- Airflow & Prefect pipelines
🧠 1.2 Snowpark for Python — Server-Side Python
Ritika discovers Snowpark, allowing Python logic to run inside Snowflake compute.
Install:
pip install snowflake-snowpark-python
Example:
from snowflake.snowpark import Session
session = Session.builder.configs(connection_parameters).create()
df = session.table("ORDERS")
df_filtered = df.filter(df["REVENUE"] > 1000)
df_filtered.show()
Benefits:
- Pushdown compute to Snowflake
- Distributed processing
- Zero data movement
- ML model execution inside Snowflake
🔥 2. Snowflake + PySpark Integration
Snowflake integrates with PySpark via the Spark Snowflake Connector.
Perfect for:
- Large-scale Spark transformations
- Ingest from Delta Lake
- ETL pipelines running on Databricks or EMR
- Converting Spark DataFrames → Snowflake tables
🔌 2.1 Spark Snowflake Connector Setup
Add dependencies:
--packages net.snowflake:snowflake-jdbc:3.13.28,net.snowflake:spark-snowflake_2.12:2.12.0-spark_3.3
Connection options:
sfOptions = {
"sfURL": "AB12345.snowflakecomputing.com",
"sfAccount": "AB12345",
"sfUser": "RITIKA",
"sfPassword": "xxxx",
"sfDatabase": "SALES_DB",
"sfSchema": "PUBLIC",
"sfWarehouse": "SPARK_WH"
}
Write Spark DataFrame → Snowflake
df.write \
.format("snowflake") \
.options(**sfOptions) \
.option("dbtable", "ORDERS_CLEAN") \
.save()
Read from Snowflake → Spark
df_snow = spark.read \
.format("snowflake") \
.options(**sfOptions) \
.option("query", "SELECT * FROM SALES_DB.PUBLIC.ORDERS") \
.load()
🏔️ 3. Snowflake + Databricks — A Modern Lakehouse Integration
Databricks teams often use:
- Spark for heavy transformations
- MLflow for experimentation
- Delta Lake for raw zone
- Snowflake for analytics, BI & governed modeling
Ritika builds a pipeline:
- Raw data → Delta Lake
- Transform in Databricks using PySpark
- Load curated data → Snowflake
- Analysts query Snowflake using BI tools
🔗 3.1 Databricks + Snowflake Connector Example
In Databricks notebook:
options = {
"sfUrl": "ab12345.ap-south-1.snowflakecomputing.com",
"sfUser": dbutils.secrets.get("snowflake", "USER"),
"sfPassword": dbutils.secrets.get("snowflake", "PASSWORD"),
"sfDatabase": "REVENUE_DB",
"sfSchema": "PUBLIC",
"sfWarehouse": "DBRICKS_WH"
}
spark_df = spark.sql("SELECT * FROM unified_sales")
spark_df.write \
.format("snowflake") \
.options(**options) \
.option("dbtable", "UNIFIED_SALES_SF") \
.mode("overwrite") \
.save()
Why Databricks integrates well with Snowflake:
- High-performance parallel load
- Supports Delta → Snowflake
- Easy credential management via Secrets
- Handles large ETL pipelines
🤖 4. Machine Learning Workflows
Ritika combines:
- Snowpark for Python (feature engineering inside Snowflake)
- Spark ML or Databricks MLflow
- Snowflake UDFs & UDTFs
- Model scoring inside Snowflake
Example: Deploy ML model using Snowpark UDF:
@udf
def score_model(amount: float) -> float:
return amount * 0.98 # simplified example
Apply on Snowflake table:
session.table("ORDERS").select(score_model("REVENUE")).show()
This removes the need for exporting large datasets.
🧠 5. Architecture Patterns
✔ Pattern 1 — Databricks as Transformation Layer, Snowflake as Analytics
- Spark cleans & enriches
- Snowflake stores final models & tables
✔ Pattern 2 — Snowpark-First Architecture
- All transformations in Snowflake
- Only ML training outside
✔ Pattern 3 — Hybrid Lakehouse
- Delta for raw + bronze
- Snowflake for gold semantic layers
📦 6. Best Practices
- Use Snowpark where possible to avoid data movement
- Use Spark Connector for large-scale batch loads
- Do not oversize Snowflake warehouses for Spark loads
- Use COPY INTO for bulk micro-batch ingestion
- Use Secrets Manager on Databricks for credentials
- Monitor connector jobs through Query History
- Keep transformations close to the compute engine (Spark or Snowflake)
🎉 Real-World Ending — “Everything Works Together Now”
With her new integration setup:
- Python automations sync instantly with Snowflake
- Spark pipelines load cleaned data at scale
- Databricks notebooks talk to Snowflake seamlessly
- ML workloads run inside Snowflake using Snowpark
- No messy data exports or CSV dumps
Her CTO smiles:
“This is a true modern data platform. Excellent work.”
📘 Summary
Snowflake integrates deeply with:
✔ Python & Snowpark
✔ PySpark
✔ Databricks
✔ ML & Feature Engineering
✔ Modern Lakehouse Workflows
Together they create a scalable, flexible, and enterprise-grade data ecosystem.