Skip to main content

Performance Tuning Techniques for Daily Company Work

✨ Story Time — “Why Is My ETL Slower Than Yesterday?”

Meera is a data engineer at a fast-growing company.

Yesterday, her ETL job completed in 12 minutes.
Today, it takes 25 minutes.

Her first thought:

“Did I change anything?”

Nothing changed in SQL. But the company added more data, more users ran queries, and the warehouse size wasn’t optimized.

Meera realized Snowflake is fast, but only if you follow performance tuning best practices.

Here’s what she learned.


🧱 1️⃣ Optimize Warehouse Usage

  • Right-size warehouses: Small → Medium → Large based on query/data size
  • Enable Auto-Suspend (1–5 mins): Avoid paying for idle compute
  • Auto-Resume: Immediate execution without waiting
  • Multi-cluster warehouses: Only for high concurrency workloads

Example: Daily ETL pipeline with 200M rows → Medium warehouse with auto-suspend/resume is cost-efficient.


2️⃣ Leverage Caching

Snowflake caches:

  • Result Cache: Returns repeated query results instantly
  • Metadata Cache: Reduces query compilation time
  • Warehouse Cache: Speeds up repeated scans of large tables

Tips:

  • Keep frequently queried tables on an active warehouse for repeated runs
  • Avoid unnecessary warehouse resizing between queries

3️⃣ Use Clustering Keys Smartly

  • Clustering improves micro-partition pruning for large tables
  • Helps queries filter efficiently
  • Avoid over-clustering — increases maintenance cost

Example: CUSTOMER table partitioned by REGION → queries for a single region read fewer micro-partitions.


4️⃣ Query Optimization Techniques

  • Select only needed columns → reduces scan bytes
  • Filter early → Snowflake pushes WHERE filters for pruning
  • **Avoid SELECT *** for large tables
  • Use CTEs carefully → materialize large intermediate results only when needed
  • Use query profile to find bottlenecks

5️⃣ Optimize Joins

  • Broadcast small tables in joins
  • Avoid cross joins unless necessary
  • Push filters before join → reduces join size
  • Check query profile → identify slow join nodes

Real example: Joining 50GB SALES with 2MB CUSTOMER → broadcast join reduces runtime from 8 mins → 20 secs.


6️⃣ Partitioning & Micro-Partition Awareness

  • Snowflake automatically partitions data in micro-partitions (50–500MB)
  • Design queries to benefit from min/max values
  • Avoid full table scans for large tables when filters exist

Tip: Use date or ID filters for high selectivity → fewer partitions scanned → faster queries.


7️⃣ Monitor & Tune Regularly

  • Use Query Profile: Identify slow nodes, bottlenecks
  • Check warehouse utilization: Avoid over/under-sized clusters
  • Analyze bytes scanned vs rows returned
  • Keep ETL and BI dashboards aligned with warehouse performance

🧪 Real-World Story — Meera Fixes Slow ETL

Problem:

  • ETL reads 200M rows → takes 25 mins

Analysis:

  1. Warehouse: Medium → OK
  2. Query scanned 90% of table → filter not selective
  3. Join with CUSTOMER table was cross join → broadcast join missing
  4. Query did SELECT * → unnecessary columns

Fix:

  • Filter pushed early
  • Broadcast join applied
  • Selected only necessary columns
  • Auto-suspend enabled

Result: Runtime reduced to 9 minutes, cost-efficient and reliable.


💡 Key Takeaways

  • Right-size warehouses → cost & speed balance
  • Leverage caching → repeated queries run faster
  • Apply clustering only when necessary
  • Optimize queries → select needed columns, filter early
  • Monitor Query Profile & adjust joins
  • Be aware of micro-partitions → benefit pruning
  • Review performance regularly

Performance tuning isn’t one-time — it’s an ongoing practice.


📘 Summary

Snowflake performance tuning for daily company work involves:

  1. Warehouse sizing & auto-suspend/resume
  2. Smart caching usage
  3. Efficient clustering & pruning
  4. Query optimization & filtering
  5. Join strategy tuning
  6. Micro-partition awareness
  7. Continuous monitoring & adjustment

By combining these techniques, data engineers like Meera can keep ETL, dashboards, and queries fast, reliable, and cost-effective.


👉 Next Topic

Handling Semi-Structured Data (JSON, XML, Avro)