In this tutorial, we will look at the Python package Gymnasium, which provides Reinforcement Learning environments.
In 2016, Brockman et al. introduced OpenAI Gym: A toolkit for reinforcement learning, that came with environment implementations with a consistent API, allowing rapid prototyping and comparison benchmarks of RL algorithms. While OpenAI Gym is deprecated by now, a successor was released as Gymnasium.
Let’s see the cart pole environment in action!
Notebook Cell
import gymnasium as gym
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import animation
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
# set animations to jshtml to render them in browser
# plt.rcParams["animation.html"] = "jshtml"
SEED = 19
rng = np.random.default_rng(SEED)
def new_seed(rng):
return rng.integers(10_000).item()# initalize a new "CartPole-v1" environment
env = gym.make("CartPole-v1", render_mode="rgb_array")
# if running locally, you can also use the human render mode
# render_mode="human" will visualize the environment in a new pygame window
# env = gym.make("CartPole-v1", render_mode="human")
# reset the environment to an initial state
observation, info = env.reset(seed=19)
# for rendering, we will store our frames in this list
frames = []
# run one episode
truncated = False
terminated = False
while not (truncated or terminated):
frames.append(env.render())
# select a random action
action = env.action_space.sample()
# apply the selected action to the environment
observation, reward, terminated, truncated, info = env.step(action)
env.close()Authorization required, but no authorization protocol specified
frames now contains the recorded rgb_array, which we can visualize using JSHTML:
def replay(frames):
fig, ax = plt.subplots()
img = ax.imshow(frames[0])
ax.axis("off")
def update(frame):
img.set_data(frame)
return [img]
anim = FuncAnimation(fig, update, frames=frames, interval=30, blit=True)
plt.close(fig)
return HTML(anim.to_jshtml())replay(frames)
What is the action space of the cart pole environment?
What is the state space of the environment?
How does the transition function look like, if we just consider the position of the cart?
How does the reward function look like?
What is the difference between
terminatedandtruncated