Data Science Projects with Python

Data Science Projects with Python
Data Science Projects with Python
English | MP4 | AVC 1920×1080 | AAC 48KHz 2ch | 6h 07m | 7.04 GB

Use pandas and Matplotlib to critically examine a dataset with summary statistics and graphs and extract meaningful insights.

Data Science Projects with Python is designed to give you practical guidance on industry-standard data analysis and machine learning tools in Python, with the help of realistic data. The course will help you understand how you can use pandas and Matplotlib to critically examine a dataset with summary statistics and graphs and extract the insights you seek to derive. You will continue to build on your knowledge as you learn how to prepare data and feed it to machine learning algorithms, such as regularized logistic regression and random forest, using the scikit-learn package. You’ll discover how to tune the algorithms to provide the best predictions on new and, unseen data. As you delve into later chapters, you’ll be able to understand the working and output of these algorithms and gain insight into not only the predictive capabilities of the models but also their reasons for making these predictions.


  • Install the required packages to set up a data science coding environment
  • Load data into a Jupyter Notebook running Python
  • Use Matplotlib to create data visualizations
  • Fit a model using scikit-learn
  • Use lasso and ridge regression to reduce overfitting
  • Fit and tune a random forest model and compare performance with logistic regression
  • Create visuals using the output of the Jupyter Notebook
Table of Contents

Data Exploration and Cleaning
1 Course Overview
2 Installation and Setup
3 Lesson Overview
4 Python and the Anaconda Package Management System
5 Different Types of Data Science Problems
6 Loading the Case Study Data with Jupyter and pandas
7 Getting Familiar with Data and Performing Data Cleaning
8 Boolean Masks
9 Data Quality Assurance and Exploration
10 Deep Dive – Categorical Features
11 Exploring the Financial History Features in the Dataset
12 Lesson Summary

Introduction to Scikit-Learn and Model Evaluation
13 Lesson Overview
14 Exploring the Response Variable and Concluding the Initial Exploration
15 Introduction to Scikit-Learn
16 Model Performance Metrics for Binary Classification
17 True Positive Rate, False Positive Rate, and Confusion Matrix
18 Obtaining Predicted Probabilities from a Trained Logistic Regression Model
19 Lesson Summary

Details of Logistic Regression and Feature Exploration
20 Lesson Overview
21 Examining the Relationships between Features and the Response
22 Finer Points of the F-test – Equivalence to t-test for Two Classes and Cautions
23 Univariate Feature Selection – What It Does and Doesn’t Do
24 Generalized Linear Models (GLMs)
25 Lesson Summary

The Bias-Variance Trade-off
26 Lesson Overview
27 Estimating the Coefficients and Intercepts of Logistic Regression
28 Assumptions of Logistic Regression
29 How Many Features Should You Include
30 Lasso (L1) and Ridge (L2) Regularization
31 Cross Validation – Choosing the Regularization Parameter and Other Hyperparameters
32 Reducing Overfitting on the Synthetic Data Classification Problem
33 Options for Logistic Regression in Scikit-Learn
34 Lesson Summary

Decision Trees and Random Forests
35 Lesson Overview
36 Decision Trees
37 Training Decision Trees – Node Impurity
38 Using Decision Trees – Advantages and Predicted Probabilities
39 Random Forests – Ensembles of Decision Trees
40 Fitting a Random Forest
41 Lesson Summary

Imputation of Missing Data, Financial Analysis, and Delivery to Client
42 Lesson Overview
43 Review of Modeling Results
44 Dealing with Missing Data – Imputation Strategies
45 Cleaning the Dataset
46 Mode and Random Imputation of PAY 1
47 A Predictive Model for PAY 1
48 Using the Imputation Model and Comparing it to Other Methods
49 Financial Analysis
50 Final Thoughts on Delivering the Predictive Model to the Client
51 Lesson Summary