Edit

Allen's REINFORCE notes

Revision as of 20:24, 24 May 2024 by Allen12 (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Allen's REINFORCE notes

Contents

Links

RLbook2020

Motivation

Learning

Learning involves the agent taking actions and the environment returning a new state and reward.

Input: $s_{t}$ : States at each time step
Output: $a_{t}$ : Actions at each time step
Data: $(s_{1},a_{1},r_{1},...,s_{T},a_{T},r_{T})$
Learn $\pi _{\theta }:s_{t}->a_{t}$ to maximize $\sum _{t}r_{t}$

State vs. Observation

Retrieved from "http://54.204.126.50/index.php?title=Allen%27s_REINFORCE_notes&oldid=1236"