Skip to main content

PySpark Quiz — Expert (Streaming & ML)

🧠 Expert — Streaming, ML & Production

1. Which method starts a PySpark Structured Streaming query?

2. How do you checkpoint a streaming query in PySpark?

3. Which PySpark method is used for aggregations over a time window in streaming?

4. How can you apply a UDF in a streaming DataFrame?

5. Which MLlib class is used for building a linear regression model?

6. How do you split a dataset into training and testing sets?

7. Which function assembles feature columns into a vector?

8. How do you make streaming output fault-tolerant?

9. What is the difference between foreachBatch and writeStream?

10. Which MLlib evaluator is used for regression models?

Career