Data Engineering Foundations

Data Engineering Foundations

English | MP4 | AVC 1280×720 | AAC 48KHz 2ch | 1h 03m | 146 MB

Data science can be generally defined as the process of making data useful, and data engineering is a key part of how and why. If you think of data science like a race car, the data engineers are the pit crew. They’re not driving the car, but they make the car much easier to drive. Data engineers make sure the data flow is running smoothly, monitor systems, anticipate problems, and repair the data pipeline whenever problems arise. They extract and gather data from multiple sources and load it into a single, easy-to-query database. In short, data engineers make data scientists’ lives easier. In this course, Harshit Tyagi explains the fundamentals of data engineering. He covers key topics like data wrangling, database schema, and developing ETL pipelines. He also details several data engineering tools like Hive, Hadoop, Spark, and Airflow. By the end of this course, it should be abundantly clear why the data engineer is one of the most valuable people in a data-driven organization.
+ Table of Contents

Introduction
1 What is data engineering

Introduction to Data Engineering
2 Introduction to data engineering
3 Data engineer vs. data scientist
4 Essential tools for data engineering

Databases and Dataframes
5 Intro to databases and their types
6 Understanding database schema
7 Distributive computing

Data Engineering Tools
8 MapReduce and Hadoop
9 Hive
10 Spark
11 Airflow

ETL Pipelines
12 Sources of data extraction
13 Data extraction from a PostgreSQL database
14 Challenge Data extraction
15 Solution Data extraction
16 Transforming data
17 Challenge Transforming data
18 Solution Transforming data
19 Loading data into a DB
20 Challenge Loading data
21 Solution Loading data
22 Scheduling ETL pipeline using Airflow

Conclusion
23 Next steps