Skip to main content

20 docs tagged with "RDD Transformations"

View all tags

Complex SQL Queries in PySpark

Learn how to write complex SQL queries in PySpark, including joins, subqueries, aggregations, and window functions for advanced analytics in Databricks.

MLlib Overview

Imagine you’re a data scientist in a high-tech lab, not just a data engineer. Data isn’t sitting quietly in files—it’s streaming, growing, and changing constantly. You want to predict outcomes, classify users, or group behaviors, all at scale.

UDFs & UDAFs — Custom Functions in SQL

Learn how to create and use PySpark UDFs (User Defined Functions) and UDAFs (User Defined Aggregate Functions) to implement custom logic and aggregations in Spark SQL and DataFrames.