When sleeping in saves you money: dynamic data snoozing for efficient online RL

https://www.ai21.com/blog/dynamic-data-snoozing/(www.ai21.com)

Reinforcement learning algorithms like GRPO can be inefficient when training inputs result in no learning signal, causing training instability and wasted compute. A method called "dynamic data snoozing" is introduced as a strategy to improve compute efficiency during online reinforcement learning. This technique addresses the slowdown caused by dynamic sampling, where examples that are too easy or too hard are filtered out. By temporarily removing overly-easy examples from the training set, dynamic snoozing can achieve up to a 3X gain in compute efficiency without any degradation in model quality.

0 points•by hdt•6 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?