Databricks Quiz — ML & AI in Production
1. Which Spark optimization technique reduces shuffles by co-locating data used in joins?
2. In Delta Lake, how do you efficiently delete outdated or corrupt records?
3. What is the best practice for handling multiple small files in Delta Lake?
4. Which Spark feature allows query execution to skip irrelevant files during reads?
5. How can you monitor Databricks streaming jobs in production?
6. Which technique helps reduce skew in Spark aggregations on large datasets?
7. In Databricks, which practice is recommended for production ML model deployment?
8. Which feature allows efficient incremental reads in Delta Lake?
9. What is the benefit of adaptive query execution (AQE) in Spark?
10. Which method ensures secure access to cloud storage credentials in Databricks?