PySpark Quiz — Basics
📘 PySpark Basics
Test your understanding of PySpark fundamentals.
1. What is the main abstraction in PySpark for distributed data processing?
2. How do you create a SparkSession in PySpark?
3. Which PySpark transformation is lazy and returns a new RDD/DataFrame?
4. Which PySpark action triggers execution and returns results to the driver?
5. How do you read a CSV file into a PySpark DataFrame?
6. What is the difference between RDD and DataFrame?
7. How do you select a column named 'age' from a DataFrame df?
8. Which PySpark function removes duplicate rows?
9. How do you filter rows where age > 30?
10. How do you show the top 5 rows of a DataFrame?