Databricks Quiz — Production Data Engineering

1. Which Spark transformation is lazy and only computed when an action is called?

2. In Databricks, what is the primary reason to use caching with Spark DataFrames?

3. Which of the following is the correct way to handle schema evolution in Delta Lake?

4. What is the main advantage of using Databricks Jobs over running notebooks manually?

5. In a high-concurrency Databricks cluster, which feature helps isolate users and workloads?

6. Which Spark operation causes data to be shuffled across the cluster?

7. What is the recommended way to manage credentials for accessing S3/ADLS/GCS in Databricks?

8. How can you monitor Spark job performance in Databricks?

9. What is the effect of increasing the number of partitions in Spark?

10. Which Delta Lake feature allows you to time travel to previous versions of data?