English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 224 lectures (38h 13m) | 12.8 GB

A rigorous and engaging deep-dive into statistics and machine-learning, with hands-on applications in Python and MATLAB.

Statistics and probability control your life. I don’t just mean What YouTube’s algorithm recommends you to watch next, and I don’t just mean the chance of meeting your future significant other in class or at a bar. Human behavior, single-cell organisms, Earthquakes, the stock market, whether it will snow in the first week of December, and countless other phenomena are probabilistic and statistical. Even the very nature of the most fundamental deep structure of the universe is governed by probability and statistics.

You need to understand statistics.

Nearly all areas of human civilization are incorporating code and numerical computations. This means that many jobs and areas of study are based on applications of statistical and machine-learning techniques in programming languages like Python and MATLAB. This is often called ‘data science’ and is an increasingly important topic. Statistics and machine learning are also fundamental to artificial intelligence (AI) and business intelligence.

If you want to make yourself a future-proof employee, employer, data scientist, or researcher in any technical field — ranging from data scientist to engineering to research scientist to deep learning modeler — you’ll need to know statistics and machine-learning. And you’ll need to know how to implement concepts like probability theory and confidence intervals, k-means clustering and PCA, Spearman correlation and logistic regression, in computer languages like Python or MATLAB.

There are six reasons why you should take this course:

This course covers everything you need to understand the fundamentals of statistics, machine learning, and data science, from bar plots to ANOVAs, regression to k-means, t-test to non-parametric permutation testing.

After completing this course, you will be able to understand a wide range of statistical and machine-learning analyses, even specific advanced methods that aren’t taught here. That’s because you will learn the foundations upon which advanced methods are build.

This course balances mathematical rigor with intuitive explanations, and hands-on explorations in code.

Enrolling in the course gives you access to the Q&A, in which I actively participate every day.

I’ve been studying, developing, and teaching statistics for 20 years, and I’m, like, really great at math.

What you’ll learn

- Descriptive statistics (mean, variance, etc)
- Inferential statistics
- T-tests, correlation, ANOVA, regression, clustering
- The math behind the “black box” statistical methods
- How to implement statistical methods in code
- How to interpret statistics correctly and avoid common misunderstandings
- Coding techniques in Python and MATLAB/Octave
- Machine learning methods like clustering, predictive analysis, classification, and data cleaning

## Table of Contents

**Introductions**

1 [Important] Getting the most out of this course

2 About using MATLAB or Python

3 Statistics guessing game

4 Using the Q&A forum

5 (optional) Entering time-stamped notes in the Udemy video player

**Math prerequisites**

6 Should you memorize statistical formulas

7 Arithmetic and exponents

8 Scientific notation

9 Summation notation

10 Absolute value

11 Natural exponent and logarithm

12 The logistic function

13 Rank and tied-rank

**IMPORTANT Download course materials**

14 Download materials for the entire course

**What are (is ) data **

15 Is data singular or plural

16 Where do data come from and what do they mean

17 Types of data categorical, numerical, etc

18 Code representing types of data on computers

19 Sample vs. population data

20 Samples, case reports, and anecdotes

21 The ethics of making up data

**Visualizing data**

22 Bar plots

23 Code bar plots

24 Box-and-whisker plots

25 Code box plots

26 Unsupervised learning Boxplots of normal and uniform noise

27 Histograms

28 Code histograms

29 Unsupervised learning Histogram proportion

30 Pie charts

31 Code pie charts

32 When to use lines instead of bars

33 Linear vs. logarithmic axis scaling

34 Code line plots

35 Unsupervised learning log-scaled plots

**Descriptive statistics**

36 Descriptive vs. inferential statistics

37 Accuracy, precision, resolution

38 Data distributions

39 Code data from different distributions

40 Unsupervised learning histograms of distributions

41 The beauty and simplicity of Normal

42 Measures of central tendency (mean)

43 Measures of central tendency (median, mode)

44 Code computing central tendency

45 Unsupervised learning central tendencies with outliers

46 Measures of dispersion (variance, standard deviation)

47 Code Computing dispersion

48 Interquartile range (IQR)

49 Code IQR

50 QQ plots

51 Code QQ plots

52 Statistical moments

53 Histograms part 2 Number of bins

54 Code Histogram bins

55 Violin plots

56 Code violin plots

57 Unsupervised learning asymmetric violin plots

58 Shannon entropy

59 Code entropy

60 Unsupervised learning entropy and number of bins

**Data normalizations and outliers**

61 Garbage in, garbage out (GIGO)

62 Z-score standardization

63 Code z-score

64 Min-max scaling

65 Code min-max scaling

66 Unsupervised learning Invert the min-max scaling

67 What are outliers and why are they dangerous

68 Removing outliers z-score method

69 The modified z-score method

70 Code z-score for outlier removal

71 Unsupervised learning z vs. modified-z

72 Multivariate outlier detection

73 Code Euclidean distance for outlier removal

74 Removing outliers by data trimming

75 Code Data trimming to remove outliers

76 Non-parametric solutions to outliers

77 Nonlinear data transformations

78 An outlier lecture on personal accountability

**Probability theory**

79 What is probability

80 Probability vs. proportion

81 Computing probabilities

82 Code compute probabilities

83 Probability and odds

84 Unsupervised learning probabilities of odds-space

85 Probability mass vs. density

86 Code compute probability mass functions

87 Cumulative distribution functions

88 Code cdfs and pdfs

89 Unsupervised learning cdf’s for various distributions

90 Creating sample estimate distributions

91 Monte Carlo sampling

92 Sampling variability, noise, and other annoyances

93 Code sampling variability

94 Expected value

95 Conditional probability

96 Code conditional probabilities

97 Tree diagrams for conditional probabilities

98 The Law of Large Numbers

99 Code Law of Large Numbers in action

100 The Central Limit Theorem

101 Code the CLT in action

102 Unsupervised learning Averaging pairs of numbers

**Hypothesis testing**

103 IVs, DVs, models, and other stats lingo

104 What is an hypothesis and how do you specify one

105 Sample distributions under null and alternative hypotheses

106 P-values definition, tails, and misinterpretations

107 P-z combinations that you should memorize

108 Degrees of freedom

109 Type 1 and Type 2 errors

110 Parametric vs. non-parametric tests

111 Multiple comparisons and Bonferroni correction

112 Statistical vs. theoretical vs. clinical significance

113 Cross-validation

114 Statistical significance vs. classification accuracy

**The t-test family**

115 Purpose and interpretation of the t-test

116 One-sample t-test

117 Code One-sample t-test

118 Unsupervised learning The role of variance

119 Two-samples t-test

120 Code Two-samples t-test

121 Unsupervised learning Importance of N for t-test

122 Wilcoxon signed-rank (nonparametric t-test)

123 Code Signed-rank test

124 Mann-Whitney U test (nonparametric t-test)

125 Code Mann-Whitney U test

126 Permutation testing for t-test significance

127 Code permutation testing

128 Unsupervised learning How many permutations

**Confidence intervals on parameters**

129 What are confidence intervals and why do we need them

130 Computing confidence intervals via formula

131 Code compute confidence intervals by formula

132 Confidence intervals via bootstrapping (resampling)

133 Code bootstrapping confidence intervals

134 Unsupervised learning Confidence intervals for variance

135 Misconceptions about confidence intervals

**Correlation**

136 Motivation and description of correlation

137 Covariance and correlation formulas

138 Code correlation coefficient

139 Code Simulate data with specified correlation

140 Correlation matrix

141 Code correlation matrix

142 Unsupervised learning average correlation matrices

143 Unsupervised learning correlation to covariance matrix

144 Partial correlation

145 Code partial correlation

146 The problem with Pearson

147 Nonparametric correlation Spearman rank

148 Fisher-Z transformation for correlations

149 Code Spearman correlation and Fisher-Z

150 Unsupervised learning Spearman correlation

151 Unsupervised learning confidence interval on correlation

152 Kendall’s correlation for ordinal data

153 Code Kendall correlation

154 Unsupervised learning Does Kendall vs. Pearson matter

155 The subgroups correlation paradox

156 Cosine similarity

157 Code Cosine similarity vs. Pearson correlation

**Analysis of Variance (ANOVA)**

158 ANOVA intro, part1

159 ANOVA intro, part 2

160 Sum of squares

161 The F-test and the ANOVA table

162 The omnibus F-test and post-hoc comparisons

163 The two-way ANOVA

164 One-way ANOVA example

165 Code One-way ANOVA (independent samples)

166 Code One-way repeated-measures ANOVA

167 Two-way ANOVA example

168 Code Two-way mixed ANOVA

**Regression**

169 Introduction to GLM regression

170 Least-squares solution to the GLM

171 Evaluating regression models R2 and F

172 Simple regression

173 Code simple regression

174 Unsupervised learning Compute R2 and F

175 Multiple regression

176 Standardizing regression coefficients

177 Code Multiple regression

178 Polynomial regression models

179 Code polynomial modeling

180 Unsupervised learning Polynomial design matrix

181 Logistic regression

182 Code Logistic regression

183 Under- and over-fitting

184 Unsupervised learning Overfit data

185 Comparing nested models

186 What to do about missing data

**Statistical power and sample sizes**

187 What is statistical power and why is it important

188 Estimating statistical power and sample size

189 Compute power and sample size using G Power

**Clustering and dimension-reduction**

190 K-means clustering

191 Code k-means clustering

192 Unsupervised learning K-means and normalization

193 Unsupervised learning K-means on a Gauss blur

194 Clustering via dbscan

195 Code dbscan

196 Unsupervised learning dbscan vs. k-means

197 K-nearest neighbor classification

198 Code KNN

199 Principal components analysis (PCA)

200 Code PCA

201 Unsupervised learning K-means on PC data

202 Independent components analysis (ICA)

203 Code ICA

**Signal detection theory**

204 The two perspectives of the world

205 d-prime

206 Code d-prime

207 Response bias

208 Code Response bias

209 F-score

210 Receiver operating characteristics (ROC)

211 Code ROC curves

212 Unsupervised learning Make this plot look nicer

**A real-world data journey**

213 Note about the code for this section

214 Introduction

215 MATLAB Import and clean the marriage data

216 MATLAB Import the divorce data

217 MATLAB More data visualizations

218 MATLAB Inferential statistics

219 Python Import and clean the marriage data

220 Python Import the divorce data

221 Python Inferential statistics

222 Take-home messages

**Bonus section**

223 About deep learning

224 Bonus content

Resolve the captcha to access the links!