Reinforcement Learning
- x → y
- x: reward or the reward function
- positive reward / negative reward
Applications
- Controlling robots
- Factory optimization
- Financial (stock) trading
- Playing games (including video games)
Mars rover example
The return in reinforcement learning
- Return = R1+rR2+r2R3+... (until terminal state)
- Discount Factor: r=0.9
- 0+(0.9)0+(0.9)20+(0.9)3100
- The return depends on the actions you take
Making decisions: Policies in reinforcement learning
Policy
- A policy is a function mapping from states to actions, that tells you what action to take in a given state
- π(s)=a
- π : policy
- s : state
- a : action
The goal of reinforcement learning
- Find a policy π that tells you what action (a=π(s)) to take in every state s so as to maximize the return
Review of key concepts
Markov Decision Process (MDP)