Allen's REINFORCE notes

Recall that the objective of Reinforcement Learning is to find an optimal policy $\pi ^{*}$ which we encode in a neural network with parameters $\theta ^{*}$ . These optimal parameters are defined as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \theta^* = \text{argmax}_\theta E_{\tau \sim p_\theta(\tau)} \left[ \sum_t r(s_t, a_t) \right] } . Let's unpack what this means. To phrase it in english, this is basically saying that the optimal policy is one such that the expected value of the total reward over following a trajectory determined by the policy is the highest over all policies.

Learning

Learning involves the agent taking actions and the environment returning a new state and reward.

Input: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle s_t} : States at each time step
Output: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a_t} : Actions at each time step
Data: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (s_1, a_1, r_1, ... , s_T, a_T, r_T)}
Learn Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \pi_\theta : s_t -> a_t } to maximize Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum_t r_t }

Humanoid Robots Wiki ^β

Allen's REINFORCE notes

Contents

Links

Motivation

Learning

State vs. Observation

Humanoid Robots Wiki β

Allen's REINFORCE notes

Contents

Links

Motivation

Learning

State vs. Observation

Humanoid Robots Wiki ^β