The introductory notes included Bandit Algorithms, MDP, Model-free Methods, Value Function Approximation, Policy Optimization. For the state-of-the-art advances, one can refer to paper directly and some excellent blogs.

Reinforcement Learning Notes (an integration of the following sections)

Section 1 Introduction

Section 2 Probability

Section 3 Bandit Algorithms

Section 4 Markov Chains

Section 5 Markov Decision Process

Section 6 Model-Free Prediction

Section 7 Model-Free Control

Section 8 Value Function Approximation

Section 9 Policy Gradient