Apache

Tutorials for Apache Big Data technologies including Apache Spark, Apache Kafka, Apache Airflow, and more critical tools for data engineers.

Join and Aggregate PySpark DataFrames

Join and Aggregate PySpark DataFrames

Perform SQL-like joins and aggregations on your PySpark DataFrames.

Todd Birchard
Todd Birchard
Spark
Working with PySpark RDDs

Working with PySpark RDDs

Working with Spark's original data structure API: Resilient Distributed Datasets.

Todd Birchard
Todd Birchard
Spark
Manage Data Pipelines with Apache Airflow

Manage Data Pipelines with Apache Airflow

Use Apache Airflow to build and monitor better data pipelines.

Todd Birchard
Todd Birchard
Apache
Structured Streaming in PySpark

Structured Streaming in PySpark

Become familiar with building a structured stream in PySpark using the Databricks interface.

Todd Birchard
Todd Birchard
Spark
Becoming Familiar with Apache Kafka and Message Queues

Becoming Familiar with Apache Kafka and Message Queues

Getting to know Apache Kafka: a horizontally scalable event streaming platform. Learn what makes Kafka critical to high-volume low-latency data pipelines.

Todd Birchard
Todd Birchard
Apache
Cleaning PySpark DataFrames

Cleaning PySpark DataFrames

Easy DataFrame cleaning techniques ranging from dropping rows to selecting important data.

Todd Birchard
Todd Birchard
Spark
Transforming PySpark DataFrames

Transforming PySpark DataFrames

Apply transformations to PySpark DataFrames such as creating new columns, filtering rows, or modifying string & number values.

Todd Birchard
Todd Birchard
Spark
Learning Apache Spark with PySpark & Databricks

Learning Apache Spark with PySpark & Databricks

Get started with Apache Spark in part 1 of our series, where we leverage Databricks and PySpark.

Todd Birchard
Todd Birchard
Spark