An Introduction to Reinforcement Learning

A brief introduction to the main concepts

What is Reinforcement Learning?¶

Reinforcement Learning (RL) is a type of Machine Learning, where agents learn to make decisions in a dynamic environment to maximize a reward.

The typical Reinforcement Learning loop looks like this:

The agent receives an observation, representing the current state of the environment.
Based on the observation and its current policy, the agent selects an action.
The agent performs the action and receives a reward based on the environment’s new state.

Markov Decision Process¶

Formally, RL can be understood as solving Markov Decision Process (MDP). A MDP is a 5-tuple $(S, A, T, \pi_0, R)$ , with:

$S$ : state space of the environment
$A$ : action space of the environment, i.e., actions that can be performed by the agent
$T : S \times A \times S \mapsto [0,1]$ : transition function, that describes how actions affect the state of the environment
$r: S \times A \times S \mapsto \mathbb{R}$ : reward function
$\pi_0: S \mapsto [0,1]$ : probability distribution over initial state

The agent selects actions using a policy $\pi: S \mapsto A$ . The objective is usually to find the optimal policy $\pi^*$ , which maximizes the cumulative reward.

Questions
Why does the cartesian product that defines the domain of the transition function include the state space $S$ twice?
Why does the transition function return a value between 0 and 1?
Assume that from the current state S, you can get to state A with an immediate high reward, or state B with an immediate low reward. Is state A always prefarable over B?
Can teaching a dog a new trick be understood as a Markov Decision Process? If yes, what are the state space, action space, and the reward function?

Bookmarks¶

Kaggle Intro to Reinforcement Learning