Scrapy masterclass: Python web scraping and data pipelines

سرفصل های دوره

Work on 7 real-world web-scraping projects using Scrapy, Splash, and Selenium. Build data pipelines locally and on AWS

1. Introduction

1.1 Resources.html

1. Introduction

2. Scrapy installation.html

2. Xpath first steps

1.1 xpath_node_types.zip

1. Xpath 101 node types

2.1 XPath 102 Cheat Sheet.pdf

2. Xpath 102 basic syntax

3.1 XPath 103 Cheat Sheet Axes (node relations).pdf

3. XPath 103 Axes (Node Relations)

4. Revisiting our real-estate web scraping example

3. Hello Scrapy

1. What is a web bot Is it ethical

2. The Scrapy Shell

3.1 Create your own Scrapy project.html

3. Creating your first Scrapy project

4.1 Create your own Scrapy spider.html

4. Creating your first Scrapy spider

5.1 Combining XPath queries.html

5. Handling combined queries using the getall() method

6.1 Item Loaders.html

6.2 The Scrapy project.html

6. Data cleansing using Item Loaders

7.1 Crawl Spiders.html

7. Pagination and link-following using Crawl Spiders

4. Scrapy web-scraping scenarios

1.1 Login bot.html

1. Login to websites

2. Changing the user-agent

3.1 Handling AJAX requests.html

3. Handling AJAX requests 1

4.1 Handling AJAX requests.html

4. Handling AJAX requests 2

5.1 Handling AJAX requests.html

5. Handling AJAX requests 3

6. Caching responses

7. Image harvesting

8.1 Images storage to S3 and FTP.html

8. Scraped images storage in FTP and AWS S3

5. Data transformation using Scrapy Pipelines

1.1 Classifieds Ads project.html

1. Introduction and sample project (classifieds ads scraping)

2.1 Remove duplicates pipeline.html

2.2 Removing duplicates pipeline.html

2. Removing ads with duplicate titles

3.1 Dropping Ads with no phones pipeline.html

3. Removing ads with no phone numbers

6. Data loading (storage) using Scrapy's pipelines

1.1 MongoDB pipeline.html

1. Storing scraped data in MongoDB

2.1 MySQL Pipeline.html

2. Storing scraped data in MySQL

3.1 Using Vault to store sensitive data for Scrapy.html

3. Using Vault to sore sensitive Scrapy settings

4.1 S3 Pipeline.html

4. Storing data to AWS S3 bucket

5. Using Amazon Glue and Athena to query the data from S3 (extra lecture)

7. Scrapy Middleware (or how to avoid getting banned)

1.1 Phone Models Project.html

1. Phone-models project and spider rate-limiting

2.1 Rotating user-agents project.html

2. Rotating user-agents middleware

3.1 Rotating proxies.html

3. Rotating proxies middleware

8. Handling JavaScript websites using Splash

1. What is Splash

2. Introduction to Docker (optional)

3. Test-driving Splash

4.1 Wikipedia with Splash.html

4. Integrating Scrapy with Splash

5.1 Handling scrolling pages with Splash.html

5. Dealing with infinitely-scrolling pages using Splash

9. Browser automation using Selenium and Scrapy

1. What is Selenium

2.1 firefox-how-to.pdf

2.2 Revisiting infinitely-scrolling pages (medium.com).html

2. Revisiting infinitely-scrolling pages (medium.com)

3.1 Clicking buttons (Yahoo Finance).html

3. Clicking buttons (Yahoo Finance)

45,900 تومان

خرید اشتراک افزودن به سبد خرید

خرید دانلودی فوری

در این روش نیاز به افزودن محصول به سبد خرید و تکمیل اطلاعات نیست و شما پس از وارد کردن ایمیل خود و طی کردن مراحل پرداخت لینک های دریافت محصولات را در ایمیل خود دریافت خواهید کرد.

تولید کننده: Udemy-Training