Databricks Quiz — Expert Performance, Debugging & Optimization

1. Which Databricks cluster configuration is recommended for high-throughput ETL pipelines?

2. What is a key strategy to reduce shuffle overhead in Spark jobs?

3. Which Databricks feature helps reduce cloud storage I/O and improves query speed?

4. In Delta Lake, what is the recommended approach for handling late-arriving data?

5. Which practice improves cost efficiency when running Spark jobs on Databricks?

6. How can you optimize Spark SQL queries for large datasets?

7. Which Databricks tool helps track data lineage and pipeline health?

8. When tuning Spark jobs, what is the impact of increasing executor memory too much?

9. Which operation should you avoid on very large datasets without proper partitioning?

10. Which feature helps ensure secure data access in Databricks across multiple workspaces?