Databricks Quiz — Architecting for Scale & Best Practices

1. Which approach is recommended for maintaining multi-year historical data in Delta Lake?

2. How can you minimize network overhead in Spark joins?

3. Which Spark feature helps in dynamic allocation of executors based on workload?

4. How do you ensure consistent data quality in ETL pipelines with Databricks?

5. Which Delta Lake feature allows you to track and roll back to previous versions of tables?

6. For streaming ML pipelines in Databricks, what is recommended for model updates?

7. Which approach is recommended for reducing small files problem in Delta Lake?

8. Which tool helps monitor Databricks jobs, clusters, and streaming metrics in production?

9. Which strategy helps in securing multi-tenant Databricks workspaces?

10. Which Spark SQL optimization is most effective for large Delta tables?