Data Pipelines with Apache Airflow, Video Edition

Data Pipelines with Apache Airflow, Video Edition

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 80 Lessons (10h 22m) | 1.52 GB

A successful pipeline moves data efficiently, minimizing pauses and blockages between tasks, keeping every process along the way operational. Apache Airflow provides a single customizable environment for building and managing data pipelines, eliminating the need for a hodgepodge collection of tools, snowflake code, and homegrown processes. Using real-world scenarios and examples, Data Pipelines with Apache Airflow teaches you how to simplify and automate data pipelines, reduce operational overhead, and smoothly integrate all the technologies in your stack.

Data pipelines manage the flow of data from initial collection through consolidation, cleaning, analysis, visualization, and more. Apache Airflow provides a single platform you can use to design, implement, monitor, and maintain your pipelines. Its easy-to-use UI, plug-and-play options, and flexible Python scripting make Airflow perfect for any data management task.

Data Pipelines with Apache Airflow teaches you how to build and maintain effective data pipelines. You’ll explore the most common usage patterns, including aggregating multiple data sources, connecting to and from data lakes, and cloud deployment. Part reference and part tutorial, this practical guide covers every aspect of the directed acyclic graphs (DAGs) that power Airflow, and how to customize them for your pipeline’s needs.

What’s inside

  • Build, test, and deploy Airflow pipelines as DAGs
  • Automate moving and transforming data
  • Analyze historical datasets using backfilling
  • Develop custom components
  • Set up Airflow in production environments

An Airflow bible. Useful for all kinds of users, from novice to expert.
Rambabu Posa, Sai Aashika Consultancy

Table of Contents

1 Getting started
2 Meet Apache Airflow
3 Pipeline graphs vs. sequential scripts
4 Introducing Airflow
5 When to use Airflow
6 Anatomy of an Airflow DAG
7 Running a DAG in Airflow
8 Running at regular intervals
9 Scheduling in Airflow
10 Cron-based intervals
11 Processing data incrementally
12 Understanding Airflow’s execution dates
13 Best practices for designing tasks
14 Templating tasks using the Airflow context
15 Templating the PythonOperator
16 Hooking up other systems
17 Defining dependencies between tasks
18 Branching
19 Conditional tasks
20 More about trigger rules
21 Sharing data between tasks
22 Chaining Python tasks with the Taskflow API
23 Beyond the basics
24 Triggering workflows
25 Polling custom conditions
26 Triggering other DAGs
27 Communicating with external systems
28 Developing locally with external systems
29 Moving data from between systems
30 Building custom components
31 Building a custom hook
32 Building a custom operator
33 Packaging your components
34 Testing
35 Setting up a CI CD pipeline
36 Testing with files on disk
37 Working with external systems
38 Using tests for development
39 Running tasks in containers
40 Introducing containers
41 Containers and Airflow
42 Creating container images for tasks
43 Running tasks in Kubernetes
44 Using the KubernetesPodOperator
45 Airflow in practice
46 Best practices
47 Manage credentials centrally
48 Use factories to generate common patterns
49 Designing reproducible tasks
50 Handling data efficiently
51 Managing your resources
52 Operating Airflow in production
53 Which executor is right for me
54 A closer look at the scheduler
55 Installing each executor
56 Setting up the KubernetesExecutor
57 Capturing logs of all Airflow processes
58 Visualizing and monitoring Airflow metrics
59 Creating dashboards with Grafana
60 How to get notified of a failing task
61 Scalability and performance
62 Securing Airflow
63 Encrypting data at rest
64 Encrypting traffic to the webserver
65 Fetching credentials from secret management systems
66 Project – Finding the fastest way to get around NYC
67 Extracting the data
68 Structuring a data pipeline
69 In the clouds
70 Airflow in the clouds
71 Google Cloud Composer
72 Airflow on AWS
73 AWS-specific hooks and operators
74 Building the DAG
75 Airflow on Azure
76 Overview
77 Airflow in GCP
78 Integrating with Google services
79 GCP-specific hooks and operators
80 Getting data into BigQuery