Difference between revisions of "Allen's REINFORCE notes"

From Humanoid Robots Wiki
Jump to: navigation, search
Line 3: Line 3:
 
=== Links ===
 
=== Links ===
  
* [http://www.incompleteideas.net/book/RLbook2020.pdf /RLbook2020]
+
* [http://www.incompleteideas.net/book/RLbook2020.pdf RLbook2020]
  
 
[[Category:Reinforcement Learning]]
 
[[Category:Reinforcement Learning]]

Revision as of 20:24, 24 May 2024

Allen's REINFORCE notes

Links

Motivation

Learning

Learning involves the agent taking actions and the environment returning a new state and reward.

  • Input: : States at each time step
  • Output: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle a_t} : Actions at each time step
  • Data: Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (s_1, a_1, r_1, ... , s_T, a_T, r_T)}
  • Learn Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \pi_\theta : s_t -> a_t } to maximize Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \sum_t r_t }

State vs. Observation