Apache Spark 3 Fundamentals

سرفصل های دوره

Learn the Fundamentals of Apache Spark 3: process data, set up the environment, use RDDs & DataFrames, optimize apps, build pipelines with Databricks and Azure Synapse. Familiarize yourself with Spark's ecosystem here in this course.

1. Course Overview

1. Course Trailer

2. Getting Started with Apache Spark

1. Introduction and Course Outline

2. Version Check

3. Need for Apache Spark

4. Understanding Spark Architecture and Ecosystem

5. How Execution Happens in Spark

6. Spark APIs RDDs, DataFrames and Datasets

7. Summary

03. Setting up Spark Environment

01. Module Overview

02. Understanding Spark Environments

03. Installing Spark

04. Monitoring Spark with Web UI

05. Option 1 - Running Spark in Command Line

06. Option 2 - Running Spark with Jupyter Notebooks

07. Option 3 - Creating Project with PyCharm IDE

08. Option 4 - Running Jobs with Spark Submit

09. Setting Up Multi-Node Cluster

10. Summary

4. Working with RDDs - Resilient Distributed Datasets

1. Module Overview

2. Understanding RDDs

3. Creating RDDs

4. Working with Pair RDDs

5. Applying Operations on RDDs

6. Using Narrow Transformations

7. Wide Transformations and Data Shuffling

8. Spark Application Concepts - Jobs, Stages and Tasks

9. Summary

5. Cleaning and Transforming Data with DataFrames

1. Module Overview

2. Understanding DataFrames

3. Creating DataFrames

4. Applying Schemas

5. Analyzing and Cleaning Data

6. Applying Transformations

7. Handling Corrupt Data

8. Saving Processed Data to Files

9. Summary

6. Working with Spark SQL, UDFs, and Common DataFrame Operations

1. Module Overview

2. Running SQL Queries on DataFrames

3. Working with Spark Tables

4. Working with User Defined Functions (UDFs)

5. Performing Operations on Multiple Datasets

6. Performing Window Operations

7. Summary

07. Performing Optimizations in Spark

01. Module Overview

02. Working with Spark Partitions

03. Changing DataFrame Partitions

04. Memory Management

05. Persisting Data

06. Spark Join Strategies and Broadcast Joins

07. Optimizing Shuffle Sort Join with Bucketing

08. Dynamic Resource Allocation

09. Resource Allocation Using Fair Scheduling

10. Summary

8. Features in Apache Spark 3

1. Introduction to Apache Spark 3

2. Adaptive Query Execution - Dynamic Coalescing

3. Adaptive Query Execution - Dynamic Join

4. Adaptive Query Execution - Handling Skew

5. Dynamic Partition Pruning

6. Summary

09. Building Reliable Data Lake with Spark and Delta Lake

01. Module Overview

02. Need for Delta Lake with Spark

03. How Delta Lake Works

04. ACID Guarantees on Delta Lake

05. Creating Delta Tables

06. Inserting Data to Delta Table

07. Performing DML Operations

08. Applying Table Constraints

09. Accessing Data with Time Travel

10. Summary

10. Handling Streaming Data with Spark Structured Streaming

1. Module Overview

2. Understanding Streaming in Spark

3. Structured Streaming Processing Model

4. Extracting Streaming Data from Source

5. Transforming and Loading Data

6. Summary

11. Working with Spark in Cloud

1. Module Overview

2. Using Spark in Databricks

3. Using Spark in Azure Synapse Analytics

4. Summary

439,000 تومان

افزودن به سبد خرید

خرید دانلودی فوری

در این روش نیاز به افزودن محصول به سبد خرید و تکمیل اطلاعات نیست و شما پس از وارد کردن ایمیل خود و طی کردن مراحل پرداخت لینک های دریافت محصولات را در ایمیل خود دریافت خواهید کرد.

تولید کننده: PluralSight