Reinforcement Learning

  • x y
    • x: reward or the reward function
    • positive reward / negative reward

Applications

  • Controlling robots
  • Factory optimization
  • Financial (stock) trading
  • Playing games (including video games)

Mars rover example

The return in reinforcement learning

  • Return = (until terminal state)
  • Discount Factor:
  • The return depends on the actions you take

Making decisions: Policies in reinforcement learning

Policy

  • A policy is a function mapping from states to actions, that tells you what action to take in a given state
    • : policy
    • : state
    • : action

The goal of reinforcement learning

  • Find a policy that tells you what action to take in every state so as to maximize the return

Review of key concepts

Markov Decision Process (MDP)