**Reinforcement Learning in Motion**

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 5h 56m | 1.60 GB

Reinforcement Learning in Motion introduces you to the exciting world of machine systems that learn from their environments! Developer, data scientist, and expert instructor Phil Tabor guides you from the basics all the way to programming your own constantly-learning AI agents. In this course, he’ll break down key concepts like how RL systems learn, how to sense and process environmental data, and how to build and train AI agents. As you learn, you’ll master the core algorithms and get to grips with tools like Open AI Gym, numpy, and Matplotlib.

Reinforcement systems learn by doing, and so will you in this hands-on course! You’ll build and train a variety of algorithms as you go, each with a specific purpose in mind. The rich and interesting examples include simulations that train a robot to escape a maze, help a mountain car get up a steep hill, and balance a pole on a sliding cart. You’ll even teach your agents how to navigate Windy Gridworld, a standard exercise for finding the optimal path even with special conditions!

With reinforcement learning, an AI agent learns from its environment, constantly responding to the feedback it gets. The agent optimizes its behavior to avoid negative consequences and enhance positive outcomes. The resulting algorithms are always looking for the most positive and efficient outcomes!

Importantly, with reinforcement learning you don’t need a mountain of data to get started. You just let your AI agent poke and prod its environment, which makes it much easier to take on novel research projects without well-defined training datasets.

Inside:

- What is a reinforcement learning agent?
- An introduction to the Open AI Gym
- Identifying appropriate algorithms
- Implementing RL algorithms using Numpy
- Visualizing performance with Matplotlib

You’ll need to be familiar with Python and machine learning basics. Examples use Python libraries like NumPy and Matplotlib. You’ll also need some understanding of linear algebra and calculus, please see the equations in the Free Downloads section for examples.

**Table of Contents**

01 Course introduction

02 Getting Acquainted with Machine Learning

03 How Reinforcement Learning Fits In

04 Required software

05 Understanding the agent

06 Defining the environment

07 Designing the reward

08 How the agent learns

09 Choosing actions

10 Coding the environment

11 Finishing the maze-running robot problem

12 Introducing the multi-armed bandit problem

13 Action-value methods

14 Coding the multi-armed bandit test bed

15 Moving the goal posts – nonstationary problems

16 Optimistic initial values and upper confidence bound action selection

17 Wrapping up the explore-exploit dilemma

18 Introducing Markov decision processes and the frozen lake environment

19 Even robots have goals

20 Handling uncertainty with policies and value functions

21 Achieving mastery – Optimal policies and value functions

22 Skating off the frozen lake

23 Crash-landing on planet Gridworld

24 Let’s make a plan – Policy evaluation in Gridworld

25 The best laid plans – Policy improvement in the Gridworld

26 Hastening our escape with policy iteration

27 Creating a backup plan with value iteration

28 Wrapping up dynamic programming

29 The windy gridworld problem

30 Monte who

31 No substitute for action – Policy evaluation with Monte Carlo methods

32 Monte Carlo control and exploring starts

33 Monte Carlo control without exploring starts

34 Off-policy Monte Carlo methods

35 Return to the frozen lake and wrapping up Monte Carlo methods

36 The cart pole problem

37 TD(0) prediction

38 On-policy TD control – SARSA

39 Off-policy TD control – Q learning

40 Back to school with double learning

41 Wrapping up temporal difference learning

42 The continuous mountain car problem

43 Why approximation methods

44 Stochastic gradient descent – The intuition

45 Stochastic gradient descent – The mathematics

46 Approximate Monte Carlo predictions

47 Linear methods and tiling

48 TD(0) semi-gradient prediction

49 Episodic semi-gradient control – SARSA

50 Over the hill – wrapping up approximation methods and the mountain car problem

51 Course recap

52 The frontiers of reinforcement learning

53 What to do next

Resolve the captcha to access the links!