وب سایت تخصصی شرکت فرین
دسته بندی دوره ها

Spark 3 on Google Cloud Platform-Beginner to Advanced Level

سرفصل های دوره

Build Scalable Batch and Real Time Data Processing Pipelines with PySpark and Dataproc


1. Introduction
  • 1. Course Introduction and Overview
  • 2. GitHub repository for the course.html
  • 3. Setup a Trial GCP Account
  • 4. Install and Setup the Gcloud SDK

  • 2. Getting Started with Spark Fundamentals
  • 1. Introduction to Dataproc on GCP
  • 2. Overview of Sparks Architecture
  • 3. Datalake vs Datawarehouse
  • 4. Role of Spark in Big Data Ecosystem
  • 5. Overview of Spark APIs
  • 6. Whats new in Spark3
  • 7. Should i be learning Spark in 2023

  • 3. Getting started with Spark DataFrame API
  • 1.1 DataframeAPI-Source-Code.zip
  • 1. Section Introduction
  • 2. Lab - Create a Dataproc Cluster
  • 3. Lab - Walkthrough of Jupyter Notebook and different components
  • 4. Lab- Basic Dataframe Operations in PySpark
  • 5. Lab - Typecasting & timestamp column extraction
  • 6. Labs - Dataframe Aggregations
  • 7. Assignment on Dataframe Aggregations.html
  • 8. Transformations and Actions in Spark
  • 9. Lab - Advanced transformations using Window Functions
  • 10. Lab - Rolling Window Operations
  • 11. Lab - Write transformed data back to a sink GCS Bucket and BigQuery
  • 12. Lab - Use Spark-Submit to submit jobs to dataproc clusters

  • 4. Getting started with SparkSql in Spark3
  • 1.1 Data-For-Joins.zip
  • 1.2 SparkSql-Source-Code.zip
  • 1. Introduction to SparkSql
  • 2. Different Types of Tables in Spark
  • 3. Lab - Create Tables for SparkSql
  • 4. Lab - Analytical Window Functions and creating permanent tables
  • 5.1 Data-For-Joins.zip
  • 5. Lab - Perform Joins on Dataframes
  • 6. What are Partitions in Spark Dataframes
  • 7. Lab - Perform repartitioning of dataframes
  • 8. Data Shuffling in Joins
  • 9. Lab - User defined functions in Spark

  • 5. Spark Concepts - Autoscaling , Optimization and Alerting
  • 1. What is a catalyst optimizer in spark
  • 2. Cache and Persist in Spark
  • 3. What is Autoscaling in spark and dataproc
  • 4. Lab - Apply Autoscaling Policies to Dataproc Clusters
  • 5. Introduction to Dataproc Workflows
  • 6. Lab - Execute GCP Workflows
  • 7. Lab - Cloud Scheduler to automate Workflow Execution
  • 8. What is Checkpointing in Spark
  • 9. What are Broadcast Joins
  • 10. Lab - Setup Alerting Policies for Spark Jobs

  • 6. Project - End to End Batch processing pipeline using Spark
  • 1.1 Project-Source-Code.zip
  • 1. Project Introduction
  • 2. Lab - Setup MySql Instance and Database on GCP
  • 3. Lab - Ingest Data into MySql
  • 4. Lab - Setup Dataproc with initialization actions
  • 5. Assignment Lab - Setup Connectivity from PySpark to MySql Db
  • 6. Assignment Lab - Perform transformations using PySpark
  • 7. Lab - Setup Workflows to execute end-to-end pipeline

  • 7. Real Time Analytics With Spark Structured Streaming
  • 1.1 Section-Source-Code.zip
  • 1. Section Introduction
  • 2. Overview of PusSub Lite
  • 3. What are Tumbling Windows
  • 4. What is Watermarking
  • 5. What are Sliding Windows
  • 6. Lab - Create PubSub Lite Reservation
  • 7. Lab - Publish Data to PubSub and Testing using PySpark
  • 8. Lab - Implement Tumbling Windows
  • 9. Lab -Implement Tumbling Window with Watermarking
  • 10. Lab- Implement Sliding Windows

  • 8. Joins on Streaming Data
  • 1.1 Section-Source-Code.zip
  • 1. Overview of Joining Streaming Dataframe
  • 2. Lab -Join Streaming Dataframe with Static Dataframe
  • 3. Lab - Join 2 Streaming Dataframes
  • 4. Lab - Use Watermarking in Streaming Joins

  • 9. Real Time Collaborative Filtering Project
  • 1.1 Project-Source-Code.zip
  • 1. Overview of the Use Case
  • 2. Lab - Model Training using ML Library and Code Walkthrough
  • 3. Lab - Code Walkthrough and Publish Data
  • 4. Lab - Real Time Product Recommendation Model in Action

  • 10. Prep Up for the Interview Questions on Spark
  • 1. Introduction and Tips
  • 2. Batch Data Processing Interview Questions - Part 1
  • 3. Batch Data Processing Interview Questions - Part 2
  • 4. Batch Processing Interview Questions - Part 3
  • 5. Real Time Data Processing Interview Questions - Part 1
  • 6. Real Time Data Processing Interview Questions - Part 2
  • 45,900 تومان
    بیش از یک محصول به صورت دانلودی میخواهید؟ محصول را به سبد خرید اضافه کنید.
    خرید دانلودی فوری

    در این روش نیاز به افزودن محصول به سبد خرید و تکمیل اطلاعات نیست و شما پس از وارد کردن ایمیل خود و طی کردن مراحل پرداخت لینک های دریافت محصولات را در ایمیل خود دریافت خواهید کرد.

    ایمیل شما:
    تولید کننده:
    مدرس:
    شناسه: 15316
    حجم: 2328 مگابایت
    مدت زمان: 337 دقیقه
    تاریخ انتشار: 4 تیر 1402
    طراحی سایت و خدمات سئو

    45,900 تومان
    افزودن به سبد خرید