Scrape the Planet! Building Web Scrapers with Python

Scrape the Planet! Building Web Scrapers with Python

English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 28 lectures (10h 26m) | 8.85 GB

Power up your big data projects with cutting-edge web scraping technology built with Python

The web is full of incredibly powerful data stored away in billions of different websites, databases and APIs. Financial data like stock prices and cryptocurrency trends, weather data in thousands of different cities in dozens of countries offered down to the hour, and fun biographical information about your favorite actor or actress: all of this information is at your fingertips, but it’s impossible to truly harness it all without a bit of help and automation!

Scrapers and spiders are incredibly powerful programs that allow developers, big data analysts and researchers to harness all of this amazing data and use it for a vast array of different applications, from the creation of data feeds to the collection of data to feed machine learning and artificial intelligence algorithms. This course offers a hands-on approach to building real, usable spiders in realistic situations for financial analysis, link graph construction and social media research, to name a few. By the end of this course, the student will be able to develop spiders and scrapers from scratch using Python and will only be limited by their own imagination. Put the vast power of the internet within your grasp by learning how to develop automated scrapers today!

This class is built with beginners in mind, and while previous experience in Python programming helps, you can start this course without ever having written a line of code.

What you’ll learn

  • How to theorize and develop web scrapers and spiders for data analysis and research
  • What are scrapers and spiders?
  • What is the difference between a scraper and a spider?
  • How are scrapers and spiders used in research?
  • How to use the Requests and BeautifulSoup libraries to build scrapers
  • How to build multi-threaded, complex scrapers
Table of Contents

1 Welcome to the course!

Theory and Ethics of Web Scraping
2 The Foundations of the Web What are web scrapers
3 What is a Page Scraper
4 What is an API
5 Ethics and Legality of Web Scraping
6 Scraper Design Approach
7 Scraper Design Part 2 Practical Design Methodology

Building Our First Scraper
8 Introduction to the Python Requests Library
9 Introduction to the Python BeautifulSoup Library
10 Scraping IMDB to get Movie Data
11 Setting Up PostGres Databases to Store Scraped Data
12 Scraping and Storing Stock Market Data with PostGreSQL
13 Conclusion

Building Spiders to Crawl the Web
14 Concepts of Spidering What is a Web Spider
15 The Kevin Bacon Problem Introducing our IMDB Spider
16 Kevin Bacon Spider Design and Skeleton Code
17 The Kevin Bacon Spider Building an Imperfect IMDB Spider
18 The Kevin Bacon Spider An Improved Design for our IMDB Spider
19 The Kevin Bacon Spider Implementing Local Caching in our IMDB Spider
20 Building a Spider to Crawl Wikipedia
21 Course Re-cap and Section 5 Intro

Building Next Level Scrapers
22 Stock Market Watcher Designing an Effective Stock Market Watcher
23 Stock Market Watcher Creating a Stock Market Watcher to Give You Alerts
24 Stock Market Watcher Improving our Stock Price Watcher with Multi-Threading
25 Building More Powerful Spiders and Scrapers with Job Queues
26 Building More Powerful Spiders and Scrapers with Prioritized Job Queues
27 Conclusion

Call to Action
28 Scrape the Planet!