⚙️ Advanced — Windows, Partitioning & Optimization
1. Which PySpark function allows ranking within partitions?
2. How do you repartition a DataFrame to 10 partitions?
3. Difference between coalesce() and repartition()?
4. Which technique avoids unnecessary shuffle?
5. How do you compute running totals partitioned by region?
6. Which feature enables Catalyst optimization?
7. How do you avoid recomputation?
8. Why partition data when writing to disk?
9. Full outer join on column id?
10. How to view the physical execution plan?