Apache Spark Essential Training: Big Data Engineering (2021)

سرفصل های دوره

Data engineering is the foundation for building analytics and data science applications in the new Big Data world. Data engineering requires combining multiple big data technologies to construct data pipelines and networks to stream, process, and store data. This course focuses on building full-fledged solutions that combine Apache Spark with other Big Data tools to create end-to-end data pipelines. Instructor Kumaran Ponnambalam begins by defining data engineering, its functions, and its concepts. Next, Kumaran goes over how Spark capabilities such as parallel processing, execution plans, state management options, and machine learning work with extract, transform, load (ETL). He introduces you to batch processing use cases and processes, as well as real-time processing pipelines. After walking you through several useful best practices, Kumaran concludes with an end-to-end exercise project.

01 - Introduction

01 - Driving big data engineering with Apache Spark

02 - Course prerequisites

03 - Setting up the exercise files

02 - 1. Data Engineering Concepts

01 - What is data engineering

02 - Data engineering vs. data analytics vs. data science

03 - Data engineering functions

04 - Batch vs. real-time processing

05 - Data engineering with Spark

03 - 2. Spark Capabilities for ETL

01 - Spark architecture review

02 - Parallel processing with Spark

03 - Spark execution plan

04 - Stateful stream processing

05 - Spark analytics and ML

04 - 3. Batch Processing Pipelines

01 - Batch processing use case Problem statement

02 - Batch processing use case Design

03 - Setting up the local DB

04 - Uploading stock to a central store

05 - Aggregating stock across warehouses

05 - 4. Real-Time Processing Pipelines

01 - Real-time use case Problem

02 - Real-time use case Design

03 - Generating a visits data stream

04 - Building a website analytics job

05 - Executing the real-time pipeline

06 - 5. Data Engineering with Spark Best Practices

01 - Batch vs. real-time options

02 - Scaling extraction and loading operations

03 - Scaling processing operations

04 - Building resiliency

07 - 6. End-to-End Exercise Project

01 - Project exercise requirements

02 - Solution design

03 - Extracting long last actions

04 - Building a scorecard

08 - Conclusion

01 - More about Apache Spark

189,000 تومان

افزودن به سبد خرید

خرید دانلودی فوری

در این روش نیاز به افزودن محصول به سبد خرید و تکمیل اطلاعات نیست و شما پس از وارد کردن ایمیل خود و طی کردن مراحل پرداخت لینک های دریافت محصولات را در ایمیل خود دریافت خواهید کرد.

تولید کننده: LinkedIn Learning (Lynda)