The Fundamental Choice in Reinforcement Learning: On‑Policy vs. Off‑Policy

https://towardsdatascience.com/the-fundamental-choice-in-reinforcement-learning-on-policy-vs-off-policy/(towardsdatascience.com)

Reinforcement learning algorithms are often distinguished by a fundamental choice: whether an agent learns from its current actions or from a different set of behaviors. On-policy methods, like SARSA, improve the exact strategy the agent is currently using, which often leads to more stable but less data-efficient learning. In contrast, off-policy methods such as Q-learning separate behavior from learning, allowing an agent to learn about an optimal strategy while exploring with a different one. This single choice dramatically influences an algorithm's data efficiency, exploration capabilities, and overall training stability.

0 points•by hdt•1 month ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?