Allen's REINFORCE notes

From Humanoid Robots Wiki
Revision as of 21:41, 24 May 2024 by Allen12 (talk | contribs) (Motivation)
Jump to: navigation, search

Allen's REINFORCE notes

Links

Motivation

Recall that the objective of Reinforcement Learning is to find an optimal policy Failed to parse (unknown function "\math"): {\displaystyle \pi^*<\math> which we encode in a neural network with parameters <math>\theta^*<\math>. These optimal parameters are defined as <math>\theta^* = \text<argmax>_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] <\math> === Learning === Learning involves the agent taking actions and the environment returning a new state and reward. * Input: <math>s_t} : States at each time step

  • Output: : Actions at each time step
  • Data: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (s_1, a_1, r_1, ... , s_T, a_T, r_T)}
  • Learn Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \pi_\theta : s_t -> a_t } to maximize Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum_t r_t }

State vs. Observation