🧠 Expert — Streaming, ML & Production
1. Which method starts a PySpark Structured Streaming query?
2. How do you checkpoint a streaming query in PySpark?
3. Which PySpark method is used for aggregations over a time window in streaming?
4. How can you apply a UDF in a streaming DataFrame?
5. Which MLlib class is used for building a linear regression model?
6. How do you split a dataset into training and testing sets?
7. Which function assembles feature columns into a vector?
8. How do you make streaming output fault-tolerant?
9. What is the difference between foreachBatch and writeStream?
10. Which MLlib evaluator is used for regression models?