Basics to Advanced: Azure Synapse Analytics Hands-On Project

سرفصل های دوره

Build complete project only with Azure Synapse Analytics focused on PySpark includes delta lake and spark Optimizations

1. Introduction

1. Introduction

2. Project Architecture

3.1 Synapse Project Deck.pdf

3. Course Slides.html

2. Origin of Azure Synapse Analytics

1. Section Introduction

2. Need of separate Analytical system

3. OLAP vs OLTP

4. A typical Datawarehouse

5. Datalake Introduction

6. Modern datawarehouse and its problem

7. The solution - Azure Synapse Analytics and its Components

8. Azure Synapse Analytics - A Single stop solution

9. Section Summary

3. Environment Setup

1. Section Introduction

2. Creating a resource group in Azure

3. Create Azure Synapse Analytics Service

4. Exploring Azure Synapse Analytics

5. Understanding the dataset

4. Serverless SQL Pool

1. Section Introduction

2. Serverless SQL Pool - Introduction

3. Serverless SQL Pool - Architecture

4. Serverless SQL Pool- Benefits and Pricing

5.1 Unemployment.csv

5.2 unemployment.zip

5. Uploading files into Azure Datalake Storage

6.1 1 data exploration.zip

6.2 Openrowset.html

6. Initial Data Exploration

7. How to import SQL scripts or ipynb notebooks to Azure Synapse

8.1 2 fixing collation warning.zip

8. Fixing the Collation warning

9.1 3 creating external datasource.zip

9. Creating External datasource

10.1 4 creating database scoped credential sas.zip

10. Creating database scoped credential Using SAS

11.1 5 creating database scoped credential mi.zip

11. Creating Database scoped cred using MI

12. Deleting existing data sources for cleanup

13. Creating an external file format - Demo

14.1 6 create external file format.zip

14. Creating an External File Format - Practical

15. Creating External DataSource for Refined container

16.1 7 creating external table.zip

16. Creating an External Table

17. End of section

5. History and Data processing before Spark

1. Section Introduction

2. Big Data Approach

3. Understanding Hadoop Yarn- Cluster Manager

4. Understanding Hadoop - HDFS

5. Understanding Hadoop - MapReduce Distributed Computing

6. Emergence of Spark

1. Section Introduction

2. Drawbacks of MapReduce Framework

3. Emergence of Spark

7. Spark Core Concepts

1. Section Introduction

2. Spark EcoSystem

3. Difference between Hadoop & Spark

4. Spark Architecture

5. Creating a Spark Pool & its benefits

6. RDD Overview

7. Functions Lambda, Map and Filter - Overview

8.1 10 understanding rdd in practical.zip

8. Understanding RDD in practical

9. RDD- Lazy loading - Transformations and Actions

10. What is RDD Lineage

11. RDD - Word count program - Demo

12.1 14 word count pyspark program practical.zip

12.2 tonystark.txt

12. RDD - Word count - PySpark Program - Practical

13. Optimization - ReduceByKey vs GroupByKey Explanation

14. RDD - Understanding about Jobs in spark Practical

15. RDD - Understanding Narrow and Wide Transformations

16. RDD- Understanding Stages - Practical

17.1 18 rdd understanding tasks practical.zip

17. RDD- Understanding Tasks Practical

18. Understand DAG , RDD Lineage and Differences

19. Spark Higher level APIs Intro

20.1 2023-01-15 213417.413947.csv

20.2 2023-01-15 213417.413947.zip

20.3 2023-01-15 213417.413947.zip

20.4 dataframe practical.zip

20. Synapse Notebook - Creating dataframes practical

8. PySpark Transformation 1 - Select and Filter functions

1. Introduction for PySpark Transformations

2.1 1 walkthough on notebook.zip

2. Walkthrough on Notebook , Markdown cells

3.1 Databricks login.html

3.2 Databricks Signup.html

3. Using Free Databricks Community Edition to practise and Save Costs

4.1 2 display and show functions.zip

4. Display and show Functions

5. Stop Spark Session when not in use

6.1 3 select and selectexpr.zip

6. Select and SelectExpr

7.1 4 filter function.zip

7. Filter Function

8. Organizing notebooks into a folder

9. PySpark Transformation 2 - Handling Nulls, Duplicates and aggregation

1.1 1 understanding fillna and nadotfill.zip

1. Understanding fillna and na.fill

2.1 2 handling duplicates and dropna.zip

2. Identifying duplicates using Aggregations

3.1 2 handling duplicates and dropna.zip

3. Handling Duplicates using dropna

4. Organising notebooks into a folder

5. Transformations summary of this section

10. PySpark Transformation 3 - Data Transformation and Manipulation

1.1 3 data transformation and manipulation.zip

1. withColumn to Create Update columns

2.1 3 data transformation and manipulation.zip

2. Transforming and updating column withColumnRenamed

11. PySpark 4 - Synapse Spark - MSSparkUtils

1. What is MSSpark Utilities

2.1 1 mssparkutils env.zip

2. MSSpark Utils - Env utils

3. What is mount point

4.1 2 msspark utils fs mount.zip

4. Creating and accessing mount point in Notebook

5.1 3 msspark utils fs utils.zip

5. All File System Utils

6.1 4 a notebook parent.zip

6. Notebook Utils - Exit command

7.1 Synapse Quotas.html

7. Creating another spark pool

8.1 To Submit ticket for quota increase.html

8. Procedure to increase vCores request (optional)

9.1 4 a notebook child.zip

9.2 4 a notebook parent.zip

9. Calling notebook from another notebook

10.1 4 a notebook parent para.zip

10. Calling notebook from another using runtime parameters

11.1 5 magic commands.zip

11. Magic commands

12.1 FAQ.html

12. Attaching two notebooks to a single spark pool

13.1 6 1 accessing mount configuration.zip

13.2 6 mount configuration.zip

13. Accessing Mount points from another notebook

12. PySpark 5 - Synapse - Spark SQL

1.1 1 accessing data using temporary views practical.zip

1. Accessing data using Temporary Views - Practical

2. Lake Database - Overview

3.1 2 creating database in lake database.zip

3. Understanding and creating database in Lake Database

4.1 2 creating database in lake database.zip

4. Using Spark SQL in notebook

5.1 3 managed vs external tables.zip

5. Managed vs External tables in Spark

6. Metadata sharing between Spark pool and Serverless SQL Pool

7. Deleting unwanted folders

13. PySpark Transformation 6 - Join Transformations

1.1 Education and Expected Salary ranges.csv

1.2 Education Details.csv

1.3 Salary Details.csv

1. Uploading required files for Joins

2.1 1 understanding joins and union.zip

2. Python notebooks till Union.html

3. Inner join

4. Left Join

5. Right Join

6. Full outer join

7. Left Semi Join

8. Left anti and Cross Join

9. Union Operation

10.1 2 performing join transformation.zip

10. Performing Join Transformation on Project Dataset

11. Summary of Transformations performed

14. PySpark Transformation 7 - String Manipulation and sorting

1. Replace function to change spaces

2.1 1 string manipulation and sorting.zip

2. PySpark Notebook for this section.html

3. Split and concat functions

4. Order by and sort

5. Section Summary

15. PySpark Transformation 8 - Window Functions

1. Row number function

2.1 1 window functions.zip

2. PySpark Notebook used in this section.html

3. Rank Function

4. Dense Rank function

16. PySpark Transformation 9 - Conversions and Pivoting

1. Conversion using cast function

2.1 1 cast and pivoting.zip

2. PySpark Notebook need for casting and pivoting lectures.html

3. Pivot function

4. Unpivot using stack function

5.1 2 to date+function.zip

5.2 Databricks - Datetime Patterns.html

5.3 Microsoft Docs - Date time patterns.html

5.4 Microsoft Docs - Datetime.html

5. Using to date to convert date column

17. PySpark Transformation 10 - Schema definition and Management

1.1 1 schema definition and management.zip

1. PySpark Notebook used in this lecture.html

2. StructType and StructField - Demo

3. Implementing explicit schema with StructType and StructField

18. PySpark Transformation 11 - UDFs

1. User Defined Functions - Demo

2.1 1 udfs.zip

2. Implementing UDFs in Notebook

3.1 1 writing data to processed container.zip

3. Writing transformed data to Processed container

19. Dedicated SQL Pool

1. Dedicated SQL pool - Demo

2. Dedicated SQL Pool Architecture

3. How distribution takes places based on DWU

4. Factors to consider when choosing dedicated SQL pool

5. Creating Dedicated SQL pool in Synapse

6. Ways to copy data into Dedicated SQL Pool

7.1 1 copy command to get data into dedicated sql pool.zip

7. Copy command to copy to dedicated SQL pool

8. Clustured Column Store index(optional)

9. Types of Distributions or Sharing patterns

10. Using Pipeline to Copy to dedicated SQL Pool

20. Reporting data to Power BI

1. Section Introduction

2. Installing Power BI Desktop

3. Creating report from Power BI Desktop

4. Creating new user in Azure AD for creating workspace (if using personal account)

5. Creating a shared workspace in Power BI

6. Publishing report to Shared Workspace

7. Accessing Power BI from Azure Synapse Analytics

8.1 synapse power bi report.zip

8. Download Power BI .pbix file from here.html

9. Creating Dataset and report from Synapse Analytics

10. Concluding the Power BI Section

11. Summary and end of project implementation

21. Spark - Optimisation Techniques

1. Optimisation Section Intro

2.1 cache.csv

2.2 partition.zip

2.3 Unemployment collect.csv

2.4 Unemployment inferschema.csv

2. Uploading required files for Optimisation

3. Spark Optimisation levels

4.1 1 optimization avoid collect.zip

4. Avoid using Collect function

5. Making notebook into particular folder

6.1 2 avoid infer schema.zip

6. Avoid InferSchema

7. Use Cache Persist 1 - Understanding Serialization and DeSerialization

8. Use Cache Persist 2 - How cache or persist will work - Demo

9.1 3 cache.zip

9. Use Cache Persist 3 - Understanding cache practically

10. Use Cache Persist 4 - Persist - What is persist and different storage levels

11.1 4 persist.zip

11.2 storage level notes.zip

11. Use Cache Persist - Notebook for persist with all storage levels.html

12. Use Cache Persist 5 - Persist - MEMORY ONLY

13. Use Cache Persist 6 - Persist - MEMORY AND DISK

14. Use Cache Persist 7 - Persist - MEMORY ONLY SER (Scala Only)

15. Use Cache Persist 8 - Persist - MEMORY AND DISK SER ( Scala Only)

16. Use Cache Persist 9 - Persist - DISK ONLY

17. Use Cache Persist 10 - Persist - OFF HEAP (Scala Only)

18. Use Cache Persist 11 - Persist - MEMORY ONLY 2 (PySpark only)

19. Use Partitioning 1 - Understanding partitioning - Demo

20.1 4 paritioning.zip

20. Use Partitioning 2 - Understand partitioning - Practical

21. Repartiton and coalesce 1 - Understanding repartition and coalesce - Demo

22. Repartiton and coalesce 2 - Understanding repartition and coalesce - Practical

23. Broadcast variables 1 - Understanding broadcast variables - Demo

24.1 6 broadcast variables.zip

24. Broadcast variables 2 - Implementing broadcast variables in notebook

25. Use Kryo Serializer

22. Delta Lake

1. Section Introduction

2. Drawbacks of ADLS

3. What is Delta lake

4. Lakehouse Architecture

5.1 SchemaManagementDelta.csv

5. Uploading required file for Delta lake

6.1 1 problems in data lake and creating delta lake.zip

6. Problems with Azure Datalake - Practical

7. Creating a Delta lake

8. Understanding Delta format

9.1 2 understanding transaction log file.zip

9. Contents of Transaction Log or Delta log file - Practical

10. Contents of a transaction log demo

11.1 3 creating delta tables using sql by path.zip

11. Creating delta table by Path using SQL

12.1 4 creating delta table in metastore pyspark and sql.zip

12. Creating delta table in Metastore using Pyspark and SQL

13.1 lesscols.zip

13.2 SchemaDifferDataType.csv

13.3 schemaextracolumn1.zip

13. Schema Enforcement - Files required for Understanding Schema Enforcement -

14. What is schema enforcement - Demo

15.1 4 creating delta table in metastore pyspark and sql.zip

15. Schema Enforcement - Practical

16.1 4 creating delta table in metastore pyspark and sql.zip

16. Schema Evolution - Practical

17.1 6 versioning and time travel.zip

17. 16. Versioning and Time Travel

18.1 7 vacuum command.zip

18. Vacuum command

19.1 8 convert to delta lake and checkpoints.zip

19. Convert to Delta command

20.1 8 convert to delta lake and checkpoints.zip

20. Checkpoints in delta log

21. Optimize command - Demo

22.1 9 optimize command.zip

22. Optimize command - Practical

23.1 10 - upsert using merge command.zip

23. Applying UPSERT using MERGE Command

23. Conclusion

1. Course Conclusion

2. Bonus Lecture.html

45,900 تومان

خرید اشتراک افزودن به سبد خرید

خرید دانلودی فوری

در این روش نیاز به افزودن محصول به سبد خرید و تکمیل اطلاعات نیست و شما پس از وارد کردن ایمیل خود و طی کردن مراحل پرداخت لینک های دریافت محصولات را در ایمیل خود دریافت خواهید کرد.

تولید کننده: Udemy