DS4B 101-P: Python for Data Science Automation

DS4B 101-P: Python for Data Science Automation

English | MP4 | AVC 1920├Ś1080 | AAC 44KHz 2ch | 27h 06m | 8.38 GB

Automate Business Processes with Python

Advance your career with Python
By learning how to automate business processes
Python for Data Science Automation is an innovative course designed to teach data analysts how to convert business processes to python-based data science automations. The course is founded on two driving principles:

Companies are transitioning repetitive business processes to automations to reduce errors, improve scale, and make data products available on-demand.
You (the student) will undergo a complete transformation, learning the in-demand skills that will empower you to help automate business processes for your organization.

Crafted for Business Analysts
that need to combine data science and programming
Python for data science automation is crafted for business analysts that need to learn Python for automating repetitive tasks and building data analysis software. This includes:
BI Professionals: Analysts that are using Business Intelligence (BI) tools like Excel, Power BI, and Tableau that would like to take their skills to a whole new level
R Users: Data Scientists and Analysts that use the R Language but need to learn Python for business to help co-integrate with Python teams.
Python Beginners: Students that need to learn Python analytical programming through a business-focused course.

Undergo a Complete Transformation
Learn the essential tools that build your data science foundations

In Python for Data Science Automation, you will learn:

  • Pandas for Data Wrangling
  • Data Visualization
  • SQL Databases
  • Modules and Python Programming
  • VSCode for Python
  • Jupyter Notebook Automation with Papermill
  • Forecasting with Sktime
Table of Contents

1 Python for Data Science Automation: Let’s Do This!
2 The Game Plan: Data Analysis Foundations
3 The Business Case: Building an Automated Forecast System
4 Course Project Zip [File Download]
5 Course Workflow: Tying Specific Actions to the Business Process
6 Ultimate Python Cheat Sheet: Python Ecosystem in 2 Pages
7 The Transactional Database Model [PDF Download]
8 Anaconda Installation
9 IDE (Integrated Development Environment) Options
10 VSCode Installation
11 Connect VSCode to Your Course Project Files
12 Conda Env Create: Make the Python Course Environment
13 Python Select Interpreter: Connect VSCode to Your Python Environment
14 Conda Env Update: Add Python Packages to Your Environment
15 Conda Env Export: Review & Share Your Environment
16 Conda Env List & Remove: List Available Environments & Remove Unnecessary Envs
17 Getting to Know VSCode
18 VSCode Theme Customization
19 VSCode Icon Themes
20 VSCode User & Workspace Settings
21 VSCode Keyboard Shortcuts
22 VSCode Python Extensions
23 VSCode Jupyter Extension – Jupyter Notebook Support
24 VSCode Jupyter Extension – Interactive Python
25 [Optional VSCode Setting] Jupyter: Send Selection to Interactive Window
26 VSCode Excel Viewer
27 VSCode Markdown & PDF Extensions
28 VSCode Path Intellisense
29 VSCode SQLite Extension
30 [Optional] VSCode Extensions for R Users
31 Python Environment Checkpoint [File Download]
32 Getting Started [File Download]
33 Using the Cheat Sheet
34 Import: pandas, numpy, matplotlib.pyplot
35 Importing From: plotnine, miziani
36 Importing Functions and Submodules: os, rich
37 Setting Up Python Interactive
38 [Reminder | Optional VSCode Setting] Jupyter: Send Selection to Interactive Window
39 Getting Help Documentation
40 IMPORTANT VSCODE SETTING: File Paths | jupyter.notebookFileRoot
41 Reading the Excel Files
42 Reviewing the Data Model
43 Exploratory 1: Top 5 Most Frequent Descriptions
44 Exploratory 2: Plotting the Top 5 Bike Descriptions
45 Preparing Orderlines for Merge: Drop Column
46 Merging the Bikes DataFrame
47 Merging the Bikeshops Data Frame
48 Datetime: Converting Order Date | Copy vs No Copy
49 Splitting the Description: Category 1, Category 2, and Frame Material
50 Splitting Location: City, State
51 Create the Total Price Column
52 Reorganizing the Columns
53 Renaming Columns
54 Reviewing the Data Transformations
55 Save Your Work: Pickle it.
56 Pandas Datetime Accessors
57 Resampling: Working with Pandas Offsets
58 Quick Plot: Plotting Single Time Series w/ Pandas Matplotlib Backend
59 Plotnine Visualization: Sales By Month, Part 1 – Geometries
60 Plotnine Visualization: Sales by Month, Part 2 – Scales & Themes
61 Resampling Groups: Combine groupby() and resample()
62 Quick Plot: Plotting Multiple Time Series w/ Pandas Matplotlib Backend
63 Plotnine Visualization, Part 1: Facetted Sales By Date & Category2 (Group)
64 Plotnine Visualization, Part 2: Adding Themes & Scales
65 Writing Files: Pickle, CSV, Excel
66 Congrats. That was a fun whirlwind. Let’s recap.
67 Getting Started [File Download]
68 Pickle Files
69 CSV Files
70 Excel Files
71 SQL Databases
72 Pandas I/O & SQL Alchemy Overviews
73 Make Database Directory
74 Create the SQLite Database
75 Read the Excel Files
76 Create the Database Tables
77 Close the Connection
78 Connect to the Database
79 Getting the Database Table Names
80 Reading from the Tables with f-strings
81 [Bonus] VSCode SQLite Extension
82 Making collect_data(), Part 1: Function Setup
83 Making collect_data(), Part 2: Read Tables from the Database
84 Making collect_data(), Part 3: Test the Database Import
85 Making collect_data(), Part 4: Joining the Data
86 Making collect_data(), Part 5: Cleaning the Data 1
87 Making collect_data(), Part 6: Cleaning the Data 2
88 Making collect_data(), Part 7: VSCode Docstring Generator
89 Making a Package (my_pandas_extensions): Adding the database module
90 ­čą│Congrats! You’re learning really powerful concepts.
91 Getting Started [File Download]
92 [VSCode Setting] Jupyter: Send Selection to Interactive Window
93 Package & Function Imports
94 My Pandas Extensions: Fix FutureWarning Message (regex)
95 How Python Works: Objects
96 Pandas DataFrame & Series
97 Numpy Arrays
98 Python Builtin Data Structures: Dictionary, List, Tuple
99 Python Builtin Data Types: Int, Float, Str, Bool,
100 Casting Basics: Numeric & String Conversions
101 Casting Sequences: To List, Numpy Array, Pandas Series, & DataFrame
102 Pandas Series Dtype Conversion
103 Pandas Data Wrangling Setup
104 Subsetting Columns by Name
105 Subsetting by Column Index (Position): iloc[]
106 Subsetting Columns with Regex (Regular Expressions)
107 Rearranging a Single Column (Column Subsetting)
108 Rearranging Multiple Columns (Repetitive Way First)
109 Rearranging Multiple Columns (List Comprehension)
110 Data Frame Rearrange: Select Dtypes, Concat, & Drop
111 Sort Values
112 Simple Filters with Boolean Series
113 Query Filters
114 Filtering with isin() and
115 Index slicing with df.iloc[]
116 Getting Distinct Values: Drop duplicates
117 N-Largest and N-Smallest
118 Random Samples
119 DataFrame Column Assignment: Calculated Columns
120 Assign Basics: Lambda Functions
121 Assign Cookbook: Making a Log Transformation
122 Assign Cookbook: Searching Text (Boolean Flags)
123 Assign Cookbook: Even-Width Binning with pd.cut()
124 Visualizing Binning Strategies with a Pandas Heat Table
125 Assign Cookbook: Quantile Binning with pd.qcut()
126 Aggregation Basics (Summarizations)
127 Common Summary Functions
128 Groupby + Aggregate Basics (Summarizations)
129 Groupby + Agg Cookbook (Summary DF 1): Sum & Median Total Price By Category 1 & 2
130 Groupby + Agg Cookbook (Summary DF 2): Sum Total Price & Quantity By Category 1 & 2
131 Groupby + Agg Details: Examining the Multilevel Column Index
132 Groupby + Agg Cookbook (Summary DF 3): Grouping Time Series with Groupby & Resample
133 Groupby + Apply Basics (Transformations)
134 Groupby + Apply Cookbook: Transform All Columns by Group
135 Groupby + Apply Cookbook: Filtering Slices by Group
136 Renaming Basics: Renaming All Columns with Lambda
137 Renaming Basics: Targeting Specific Columns
138 Advanced Renaming: Renaming Multi-Index Columns
139 Set Up Summarized Data: Revenue by Category 1
140 Pivot: To Wide Format
141 Export a Stylized Pandas Table to Excel (Wide Data)
142 Melt: To Long Format
143 Plotnine – Making a Faceted Horizontal Bar Chart (Tidy Long Data)
144 Intro to Categorical Data: Sorting the Plotnine Plot
145 Pivot Table (An awesome function for BI Tables)
146 Unstack: A programmatic version of pivot()
147 Stack: A programmatic version of melt()
148 Merge: Data Frame Joins
149 Concat: Binding DataFrames Rowwise & Columnwise
150 Splitting Text Columns
151 Combining Text Columns
152 Set Up Summarized Data: Sales by Category 2 Daily
153 Apply: Lambda Aggregations vs Transformations
154 Apply: Broadcasting Aggregations
155 Grouped Apply: Broadcasting
156 Grouped Transform: Alternative to Grouped Apply (Fixes Index Issue)
157 Making a “Data Frame” Function: add_columns()
158 Pipe: Method chaining our custom function using the pipe
159 Challenge #1: Data Wrangling with Pandas [File Download]
160 Method 1: Jupyter VSCode Integration
161 Method 2: Jupyter Notebooks (Legacy Method)
162 Method 3: JupyterLab (Next Generation of Jupyter)
163 Challenge Objectives
164 Getting Started: Syncing Your JupyterLab Current Working Directory (%cd and %pwd)
165 Challenge Tasks
166 Challenge Solution
167 Congrats! You’ve finished your first challenge.
168 Automating Time Series Forecasting
169 Getting Started [File Download]
170 VSCode Extension: Browser Preview
171 Package Imports
172 The ProfileReport() Class
173 Section 1: Profile Overview
174 Section 2A: Numeric & Date Variables
175 Section 2B: Categorical (Text) Variables
176 Sections 3-6: Interactions, Correlations, Missing Values, & Sample
177 Pandas Extension: df.profile_report()
178 Exporting the Profile Report as HTML
179 Getting Started
180 TimeStamp & Period Conversions
181 Pandas Datetime Accessors
182 Date Math: Offsetting Time with TimeDelta’s
183 Date Math: Getting Duration between Two TimeStamps
184 Creating Date Sequences: pd.date_range()
185 Periods (In-Depth)
186 Resampling (In-Depth): bike_sales_m_df
187 Grouped Resampling (In-Depth): bike_sales_cat2_m_wide_df
188 Reorganizing: Adding Comments
189 Differencing with Lags (Single Time Series)
190 Differencing with Lags (Multiple Time Series)
191 Difference from First (Single Time Series)
192 Difference From First (Multiple Time Series)
193 Cumulative Expanding Windows (Single Time Series)
194 Cumulative Expanding Windows (Multiple Time Series)
195 Moving Average (Single Time Series)
196 Moving Average (Multiple Time Series)
197 Next Steps (Where we are headed)
198 Getting Started [File Download]
199 Setup: Python Imports & Data
200 Function Anatomy: pd.Series.max()
201 Errors (Exceptions)
202 Function Names
203 Function Anatomy: **kwargs
204 Detect Outliers: Function Setup
205 IQR Outlier Method, Part 1
206 IQR Method, Part 2
207 New Argument: IQR Multiplier
208 New Argument: How? (Both, Upper, Lower)
209 Checking for Pandas Series Input
210 Checking IQR Multiplier for Int or Float Type
211 Checking that IQR Multiplier is a Positive Value
212 Checking that How is a Valid Option: both, lower, upper
213 Informative Help Documentation: Adding a Docstring
214 Testing Our Function: Detecting Outliers within Groups
215 Extending the Pandas Series Class
216 Summarize By Time: A handy function for time series wrangling
217 Setting Up the “Summarize By Time” Function
218 Handling the Date Column Input
219 Handling Groups Input
220 Handling the Time Series Resample
221 Handling the Aggregation Function Input
222 Handling the Value Column Input
223 Forcing the Value Column Input to a List (to generate a data frame)
224 Bug! Thinking through a solution
225 Solution: Converting to a Function Dictionary with Zip + Dict
226 Handling the Unstack
227 Handling the Period Conversion
228 Add Fill Missing Capability
229 Review the Core Functionality
230 Check Incoming Data: Raising a TypeError
231 Adding the Docstring
232 Pandas Flavor: Extending Pandas DataFrame Class
233 Getting Started [File Download]
234 Sktime Documentation
235 How to Google Search like a Pro
236 Set Up & Imports
237 Summarizing to get Total Revenue by Month
238 Summarizing to get Total Revenue by Category 2 & Month
239 What is AutoARIMA?
240 AutoARIMA Applied: Forecaster, Fit, Predict
241 Adding Confidence Intervals (Prediction Intervals)
242 Tuple Unpacking (Predictions, Confidence Intervals)
243 Forecast Visualization
244 Code Housekeeping
245 Multiple Time Series Forecasting: AutoARIMA()
246 For Loop: Iterate Across the DataFrame Columns
247 For Loop: Modeling AutoARIMA()
248 For-Loop: Getting the Confidence Intervals
249 For-Loop: Combine with DataFrame | Actual Values, Predictions, & CIs
250 For-Loop: Storing the Results (as a Dictionary)
251 Housekeeping: Appending Variable Types to Variable Names
252 Visual Forecast Assessment
253 TQDM: Progress Bars
254 Setting up the ARIMA Automation Function
255 Making arima_forecast() | Function Definition
256 Function Body | Setting Up the Iteration
257 Training the AutoARIMA() Models
258 Controlling Progress Bars: tqdm(min_interval)
259 Making Predictions and Confidence Intervals
260 Combine Results into a DataFrame
261 Compose a Prediction Dictionary
262 Return Results as a Single DataFrame | Rowwise Concatenation
263 Setting the Column Names of the Output
264 Drop remaining columns beginning with “level_”
265 Testing the arima_forecast() function
266 Creating the forecasting.py module
267 Docstring: arima_forecast()
268 Adding Checks: arima_forecast()
269 Finally – Check Your Forecasts with Grouped Pandas Plotting
270 Recap: You’ve just made an ARIMA Forecast Automation!
271 Introduction to ETS Forecasting (Exponential Smoothing)
272 Challenge 2 [File Download]
273 Solution
274 Part 3: Visualization & Reporting
275 Getting Started [File Download]
276 Plotnine Documentation
277 Plotnine Anatomy: Imports
278 Data Summarization: For Plotting Annual Bike Sales
279 The Plot Canvas: Mapping Columns to Plot Components
280 Plotnine Geometries
281 Adding a Trend Line: geom_smooth()
282 Formatting Plots
283 Expand Limits
284 Scales: Dollar Format for Y-Axis
285 Scales: Date Format for X-Axis
286 Labs and Themes
287 Saving the ggplot
288 Exploring the Plotnine Object
289 Setting Up
290 Scatter Plot: Data Manipulation
291 Scatter Plot: Visualization
292 Line Plot: Data Manipulation
293 Line Plot: Visualization
294 Data Manipulation, Part 1: No Categorical Ordering
295 Visualization, Part 1: Without Categorical Ordering
296 Aside: Introduction to Plotting using Categorical Data Type
297 Finalizing the Horizontal Bar Chart
298 Histogram: Data Manipulation
299 Histogram: Visualization
300 Histogram: Using Fill Aesthetic to Explore Differences by a Category
301 Histogram: Using Facet Grids to Compare Distributions by Category
302 Density Plots: Kernel Density Estimation (KDE) using geom_density()
303 Box Plot: Data Manipulation
304 Box Plot: Visualization
305 Violin Plot with Jitter: geom_violin() and geom_jitter()
306 Data Manipulation: Add a Total Price Text Column with USD Dollar Format
307 Creating the Bar Plot: geom_col() and geom_smooth()
308 Adding Text to a Bar Plot: geom_text()
309 Highlighting an Outlier with a Label: geom_label()
310 Finalizing the Plot with Scales and Themes
311 Sales by Month and Category 2: Data Manipulation
312 Facets: Adding subplots “facets” with facet_wrap()
313 Scales: Applying scales to alter x, y, and color mappings
314 Themes: Theme Customization with Pre-Built Themes | theme_matplotlib()
315 Theme Elements: Customization with theme()
316 Plot Title and X/Y-Axis Labels: labs()
317 Getting Started
318 Package Imports
319 Our Forecasting Workflow Recap
320 Data Preparation: Melting the Value and Prediction Columns
321 Data Preparation: Fixing the FutureWarning
322 Visualization: Setting up the canvas with ggplot()
323 Visualization: Adding geoms and facets
324 Visualization: Scales and Theme Minimal
325 Visualization: Customizing the Theme Elements
326 Making the plot_forecast() Function Definition
327 Data Wrangling: Implementing the Melt
328 Handling the Time-Based Column: Converting to TimeStamp
329 Visualization: Parameterizing the Plot
330 Testing the Forecast Plot Function Parameters
331 Testing the Automation Workflow
332 Reordering the Subplots using Cat Tools
333 Adding the plot_forecast() function to our forecasting module
334 Docstring | Testing Our Imported plot_forecast() Function
335 Getting Started [File Download]
336 Package Imports
337 Reviewing Our Files
338 Generating the Forecasting Workflow
339 Generating the Forecast Visualization
340 Overview of the Database I/O Process
341 Preparing the Forecast for Update
342 Validating the Column Names
343 Testing the Prep Forecast for Database Function
344 Setting Up the Write Forecast to Database Function
345 Modularizing the Data Preparation Step
346 Specifying SQL Data Types
347 Write to Database
348 Close Connection
349 Testing Our Function
350 Creating our Read Forecast Function
351 Adding Functions to Database Module
352 Docstrings
353 Automation Workflow with Database I/O
354 Forecasting 1: Total Revenue
355 Fix #1: Reorder Columns in Prep Data Function
356 Plotting Total Revenue Forecast
357 Forecasting 2: Revenue by Category 1
358 Forecasting 3: Revenue by Category 2
359 Forecasting 4: Forecast Quarterly Revenue by Customer
360 Fix #2: Prep Data | Add timestamp conversion
361 Rerun Our Workflow: Success!
362 Writing to the Database
363 Pro-Tip: Saving Intermediate Data
364 Utility Function: Convert to Datetime
365 Rerun the Forecast Workflow
366 Read Forecast from Database
367 Recap: Debugging is a Skill
368 Jupyter Automated Reporting
369 Getting Started [File Download]
370 The Updated Database Script: Automatically Run Forecasts
371 python update_database.py
372 SQLite Explorer
373 Setting Up the Working Directory
374 Importing Data and Parameterizing a Header with Markdown
375 Parameterizing a Paragraph with Markdown
376 Performance Summary: Pivot Table, Part 1
377 Performance Summary: Pivot Table, Part 2
378 Plotting the Forecast: plot_forecast()
379 Papermill Setup
380 Package Imports
381 Papermill Documentation
382 Developing Parameters: Game Plan
383 Making ID Sets, Part 1
384 Making ID Sets, Part 2
385 Part 1: Intro to Pathlib and OS
386 Part 2: Detecting Directories Exist & Making New Directories
387 Jupyter Template Setup
388 Parameterizing the Jupyter Template
389 Finishing the Juyter Template Parameterization
390 The pm.exectute_notebook() function
391 Setting Up Key Parameters
392 Iterating without a For-Loop
393 Iterating with a For-Loop
394 Getting Started
395 Setting Up the Report Parameters
396 Creating a Resource Path
397 String Transformation: Make File Names from Report Titles
398 Setting Up run_reports()
399 Make the Report Directory
400 Setting Up the For-Loop Parameters
401 Setting Up Jupyter Notebook Execution (Inside of For-Loop)
402 Package Resources: Setting Up the Template Path
403 Integrating the Run Reports Function into Our Package
404 Getting Started [File Download]
405 NB Convert Documentation & Installation Requirements
406 Step 1: Pandoc Installation
407 Step 2: Tex Installation (MikTex Windows Shown | Mac Use MacTex)
408 HTML Report Conversion
409 PDF Report Conversion
410 Setup & Imports
411 Making the Config()
412 Locating Files with Glob
413 Exporting an HTML Report Programmatically
414 HTML Automation: Using a For-Loop to Convert All 4 Reports
415 PDF Automation: Using a For-Loop to Convert All 4 Reports
416 Getting Set Up
417 Integrating glob: Pulling the Jupyter Notebook File Paths
418 Integrate “Convert to HTML” Report Automation
419 Test “Convert to HTML” Report Automation
420 Integrate “Convert to PDF” Report Automation
421 Test “Convert to PDF” Report Automation
422 My Pandas Extensions: Upgrade reporting.py with HTML & PDF Reports, Part 1
423 My Pandas Extensions: Upgrade reporting.py with HTML & PDF Reports, Part 2
424 Run Forecast Reports Py: Part 1 – The main() function
425 Run Forecast Reports Py: Part 2 – Adding Timestamps to Folders
426 Run Forecast Reports Py: Part 3 – Running Reports
427 Run Forecast Reports Py: Part 4 – Adjusting Folder Automation
428 Scheduling Python Scripts Bonus!!!
429 Making the Batch File (.bat) to run our Python Script
430 Setting up Automated Tasks with Windows Task Scheduler
431 Debugging Windows Task Scheduler Tasks with Pause
432 Fixing the SQL Alchemy Connection
433 Removing the Automation: Disable & Delete
434 Python Script Setup | SQL Database Absolute Path
435 The Mac Automator
436 Scheduling the Automator App with Calendar
437 Congratulations!!!
438 Forecasting 100 Time Series in Python with Sktime