وب سایت تخصصی شرکت فرین
دسته بندی دوره ها

Apache Spark Essential Training: Big Data Engineering (2021)

سرفصل های دوره

Data engineering is the foundation for building analytics and data science applications in the new Big Data world. Data engineering requires combining multiple big data technologies to construct data pipelines and networks to stream, process, and store data. This course focuses on building full-fledged solutions that combine Apache Spark with other Big Data tools to create end-to-end data pipelines. Instructor Kumaran Ponnambalam begins by defining data engineering, its functions, and its concepts. Next, Kumaran goes over how Spark capabilities such as parallel processing, execution plans, state management options, and machine learning work with extract, transform, load (ETL). He introduces you to batch processing use cases and processes, as well as real-time processing pipelines. After walking you through several useful best practices, Kumaran concludes with an end-to-end exercise project.


01 - Introduction
  • 01 - Driving big data engineering with Apache Spark
  • 02 - Course prerequisites
  • 03 - Setting up the exercise files

  • 02 - 1. Data Engineering Concepts
  • 01 - What is data engineering
  • 02 - Data engineering vs. data analytics vs. data science
  • 03 - Data engineering functions
  • 04 - Batch vs. real-time processing
  • 05 - Data engineering with Spark

  • 03 - 2. Spark Capabilities for ETL
  • 01 - Spark architecture review
  • 02 - Parallel processing with Spark
  • 03 - Spark execution plan
  • 04 - Stateful stream processing
  • 05 - Spark analytics and ML

  • 04 - 3. Batch Processing Pipelines
  • 01 - Batch processing use case Problem statement
  • 02 - Batch processing use case Design
  • 03 - Setting up the local DB
  • 04 - Uploading stock to a central store
  • 05 - Aggregating stock across warehouses

  • 05 - 4. Real-Time Processing Pipelines
  • 01 - Real-time use case Problem
  • 02 - Real-time use case Design
  • 03 - Generating a visits data stream
  • 04 - Building a website analytics job
  • 05 - Executing the real-time pipeline

  • 06 - 5. Data Engineering with Spark Best Practices
  • 01 - Batch vs. real-time options
  • 02 - Scaling extraction and loading operations
  • 03 - Scaling processing operations
  • 04 - Building resiliency

  • 07 - 6. End-to-End Exercise Project
  • 01 - Project exercise requirements
  • 02 - Solution design
  • 03 - Extracting long last actions
  • 04 - Building a scorecard

  • 08 - Conclusion
  • 01 - More about Apache Spark
  • 139,000 تومان
    بیش از یک محصول به صورت دانلودی میخواهید؟ محصول را به سبد خرید اضافه کنید.
    خرید دانلودی فوری

    در این روش نیاز به افزودن محصول به سبد خرید و تکمیل اطلاعات نیست و شما پس از وارد کردن ایمیل خود و طی کردن مراحل پرداخت لینک های دریافت محصولات را در ایمیل خود دریافت خواهید کرد.

    ایمیل شما:
    تولید کننده:
    شناسه: 42027
    حجم: 147 مگابایت
    مدت زمان: 65 دقیقه
    تاریخ انتشار: 21 آذر 1403
    طراحی سایت و خدمات سئو

    139,000 تومان
    افزودن به سبد خرید