Materialized Views in Databricks (SQL + Pipelines)
π A Simple Story β βRecomputing the Same Thing Every Dayββ
Imagine you work at a company where the sales team asks:
βWhat are the top 10 selling products today?β
βWhat was the daily revenue for the past week?β
βHow many new users signed up yesterday?β
These questions sound simpleβ¦
but answering them can be expensive:
- Scanning billions of rows
- Running the same aggregations repeatedly
- Recalculating heavy queries for every dashboard refresh
- Reprocessing old data even when nothing changed
Itβs like re-cooking a whole meal every time someone wants a snack.
Databricks Materialized Views solve this by precomputing results and updating them efficiently.
π‘ What Is a Materialized View (MV)?β
A Materialized View is a table that:
- Stores the results of a query
- Updates automatically when underlying data changes
- Can serve BI dashboards extremely fast
Think of it as:
βA cached version of a query, but smart enough to update itself.β
Instead of recomputing everything, Databricks updates only the new or changed data.
π§ Why Use Materialized Views?β
β Faster BI dashboardsβ
Your dashboards read from a precomputed table, not raw data.
β Lower compute costβ
Only incremental changes are processed.
β Simple SQL definitionβ
No complex jobs or triggers required.
β Reliable + consistentβ
Built on Delta Lake and ACID transactions.
β Works great with slowly changing dataβ
You never recompute old partitions unnecessarily.
π§ͺ Example 1 β Creating a Materialized View (SQL)β
Hereβs how easy it is:
CREATE MATERIALIZED VIEW daily_sales_mv
AS
SELECT
DATE(order_timestamp) AS order_date,
SUM(amount) AS total_revenue,
COUNT(*) AS total_orders
FROM live_orders
GROUP BY DATE(order_timestamp);
Now Databricks automatically:
- Tracks changes in
live_orders - Updates the materialized view
- Keeps results ready for dashboards
No scheduler. No orchestration. No manual refresh.
π§ͺ Example 2 β Materialized Views in a Pipeline (SQL Warehouse or Workflow)β
You can include MVs inside Databricks SQL Pipelines.
Example pipeline step:
CREATE OR REFRESH MATERIALIZED VIEW top_products_mv
AS
SELECT
product_id,
COUNT(*) AS order_count
FROM silver_orders
GROUP BY product_id;
Whenever silver_orders is updated,
this MV updates too β incrementally.
π§ Materialized View vs Standard Viewβ
| Feature | Standard View | Materialized View |
|---|---|---|
| Stored Results | β No | β Yes |
| Automatic Refresh | β No | β Yes |
| Fast for BI | β Slower | β Very Fast |
| Incremental Refresh | β No | β Yes |
| Cost | Higher (each query recomputes) | Lower (incremental updates) |
A materialized view behaves like a real table β but one you donβt manually maintain.
π How Refreshing Works (Simple Explanation)β
When new data arrives in the base table:
- Databricks identifies what changed
- Recalculates only that part
- Merges results into the MV
- Keeps everything ACID-consistent
Itβs like updating only the new pages of a book rather than reprinting the whole book every day.
π§± Best Use Casesβ
β Perfect for:β
- Daily/weekly/monthly metrics
- Revenue dashboards
- Top-N ranking queries
- Aggregation-heavy workloads
- Slowly changing tables
- BI tools (Power BI, Looker, Tableau)
- Gold-layer summaries
β Not ideal for:β
- Raw, rapidly changing high-volume streaming
- Very complex joins with unstable schemas
- Use cases where you need full control over refresh logic
For those, Streaming Tables or DLT may be better.
πΌ Example Architecture Using MVsβ
Bronze (raw data)
β
Silver (cleaned)
β
Gold Aggregations (Materialized Views)
β
Dashboards / BI Tools
Materialized Views live in the Gold layer because they represent final, business-ready data.
π Summaryβ
- A Materialized View stores the results of a query and updates automatically.
- It is perfect for dashboards and frequently reused aggregations.
- Databricks refreshes MVs incrementally, saving huge compute costs.
- MVs are excellent in the Gold layer for reliable, fast analytics.
- Use MVs when you want stable, precomputed insights without rerunning heavy queries.
Materialized Views make your Lakehouse smoother, faster, and cheaper β especially for analytics teams.
π Next Topic
Databricks Catalog, Schema & Table Permissions (RBAC)