Prophecy Data Transformation Copilot for Data Engineering

سرفصل های دوره

Learn Databricks and Spark data engineering to deliver self-service data transformation and speed pipeline development

01 - A warm welcome from Prophecys co-founder

001 Welcome to Prophecy for Data Engineering on Databricks and Spark

02 - The future of data transformation

001 Whats the future of data transformation

002 The evolution of data transformation

003 Ideal data transformation solution for the cloud

004 Prophecy and the future of data transformation

005 How to build the ideal data transformation in the cloud

03 - Data lakes, warehouses, and lakehouses - when to use what (Optional)

001 What is a data lake and the difference between a data lake and data warehouses

002 Introducing data lakehouse and why its the perfect solution

04 - Introduction to Spark and Databricks (Optional)

001 Meet your instructor and module overview

002 Apache Spark architecture and concepts

003 Spark language and tooling

004 From Apache Spark to Databricks - why are they different

005 Data lakehouse, unity catalog, optimization and security

006 Working with Spark best practices

007 Spark and Databricks tips and tricks

05 - Getting started with Prophecy

001 Prophecy Overview - lets learn together!

002 Setting up a Databricks Fabric to execute our Pipelines

003 Create a Prophecy Project to manage our Spark code

004 Getting started with the Pipeline canvas

005 Explore code view and perform simple aggregations

006 Join accounts and opportunities data and write results to a delta table

007 Create a Pipeline and read from Data Sources to start building our Pipeline

008 Deploying Pipelines to production to run our scheduled Pipelines

009 Introduction to Prophecy Users and Teams

06 - Data Sources and Targets

001 Data Sources and Targets overview

002 Parse and read raw data from object store with best practices

003 Prophecy built-in Data Sources and Data Sets

004 Explore Data Source default options

005 Read and parse source parquet data and merge schema

006 Handle corrupt and malformed records when reading from object stores

007 Additional options to handle corrupt and malformed reocrds

008 Work with source data schema and delimiters

009 Read from delta tables as sources

010 Write data to a delta table using a target Gem

011 Partition data when writing to a delta table for optimal performance

012 What weve learned in this module

07 - Data Lakehouse Architecture

001 Data lakehouse and the medallion architecture module overview

002 Medallion architecture - bronze, silver, and gold layer characteristics

003 Read and write data by partition - daily load from object storage

004 Additional data load by partition - daily load from object storage

005 Introduction to data models in a data lakehouse

006 Write the bronze layer data to delta tables

007 Introduction to Slowly Changing Dimensions (SCD)

008 Implement simple SCD2 for bronze layer table

009 Bulk load read and write options

010 Bulk load historical data with SCD2

011 Delta table data versioning

012 Work with incompatible schemas

013 Recover data from a previous version

014 A summary of what weve learned in this module

08 - Building the Silver and Gold Layers

001 Building the Silver and Gold layers - Overview

002 Data integration and cleaning in the Silver layer

003 Build a data model and integrate data in the Silver layer

004 Implement SCD2 in the silver layer

005 Generating unique IDs and write data to delta tables

006 Business requirements for the Gold layer

007 Perform analytics in the Gold layer to build business reports

008 Using subgraphs for reusability to simplify Pipelines

009 A summary of what weve learned in this module

09 - Deploying Pipelines to production

001 Pipeline deployment overview

002 Ways to orchestrate workflows to automate jobs

003 Configure incremental Pipeline to prepare for scheduled runs

004 Create a Prophecy Job to schedule the Pipelines to run daily

005 What is CICD and how to deploy Pipelines to production

006 Advanced use cases integrate with external CICD process using PBT

007 A summary of what weve learned in this module

external-links.txt

10 - Managing versions and change control

001 Version management and change control overview

002 Prophecy Projects and the git process

003 Collaborating on a Pipeline - catching dev branch to the main branch

004 Reverting changes when developing a Pipeline before committing

005 Reverting back to a prior commit after committing by using rollback

006 Merging changes and switching between branches

007 Resolving code conflicts with multiple team members are making commits

008 Cloning an exiting Prophecy Project to a new repository

009 Reusing an existing Prophecy Project by importing the Project

010 Creating pull requests and handling commit conflicts

011 A summary of what weve learned in this module

11 - Reusability and extensibility

001 Reusability and extensibility overview

002 The importance of setting data engineering standards - reuse and extend

003 Convert a script to a customized Gem to share and reuse

004 Create a new Gem for multi-dimensional cube using the specified express

005 Create an UI for the cube Gem for users to define the cube

006 Adding additional features to make the customized Gem UI intuitive

007 Error handling with adding validations and customized error messages

008 Testing customized cube Gem and publishing the Gem to share with others

009 Assigning proper access to share the newly built cube Gem

010 Use the newly created cube Gem by adding it a dependency

011 A summary of what weve learned in this module

external-links.txt

12 - Data testing

001 Data quality and unit testing overview

002 Medallion architecture and data quality

003 Data quality Pipeline walkthrough - how to populate data quality log

004 Silver layer data quality checks, define errors, and write to delta table

005 Data integration quality checks with joins - check if customer IDs are missing

006 Performing data reconciliation checks - identify mismatching column values

007 Identifying and tracking data quality issues by drilling down to a specific ID

008 Executing data quality checks in phases - stop the pipeline if error exists

009 Unit testing options - testing expressions using output equality

010 Explore code view of the unit test

011 Running the unit tests

012 Unit testing expressions using output predicates

013 A summary of what weve learned in this module

139,000 تومان

افزودن به سبد خرید

خرید دانلودی فوری

در این روش نیاز به افزودن محصول به سبد خرید و تکمیل اطلاعات نیست و شما پس از وارد کردن ایمیل خود و طی کردن مراحل پرداخت لینک های دریافت محصولات را در ایمیل خود دریافت خواهید کرد.

تولید کننده: Udemy