Allen's REINFORCE notes

From Humanoid Robots Wiki

Revision as of 20:24, 24 May 2024 by Allen12 (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to: navigation, search

Allen's REINFORCE notes

Contents

1 Links
2 Motivation
3 Learning
4 State vs. Observation

Links

/RLbook2020

Motivation

Learning

Learning involves the agent taking actions and the environment returning a new state and reward.

Input: $s_{t}$ : States at each time step
Output: $a_{t}$ : Actions at each time step
Data: $(s_{1},a_{1},r_{1},...,s_{T},a_{T},r_{T})$
Learn Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \pi_\theta : s_t -> a_t } to maximize Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum_t r_t }

State vs. Observation

Retrieved from "http://54.204.126.50/index.php?title=Allen%27s_REINFORCE_notes&oldid=1235"