Skip to main content

PySpark Quiz — Basics

📘 PySpark Basics

Test your understanding of PySpark fundamentals.

1. What is the main abstraction in PySpark for distributed data processing?

2. How do you create a SparkSession in PySpark?

3. Which PySpark transformation is lazy and returns a new RDD/DataFrame?

4. Which PySpark action triggers execution and returns results to the driver?

5. How do you read a CSV file into a PySpark DataFrame?

6. What is the difference between RDD and DataFrame?

7. How do you select a column named 'age' from a DataFrame df?

8. Which PySpark function removes duplicate rows?

9. How do you filter rows where age > 30?

10. How do you show the top 5 rows of a DataFrame?

Career