0
When sleeping in saves you money: dynamic data snoozing for efficient online RL
https://www.ai21.com/blog/dynamic-data-snoozing/(www.ai21.com)Reinforcement learning algorithms like GRPO can be inefficient when training inputs result in no learning signal, causing training instability and wasted compute. A method called "dynamic data snoozing" is introduced as a strategy to improve compute efficiency during online reinforcement learning. This technique addresses the slowdown caused by dynamic sampling, where examples that are too easy or too hard are filtered out. By temporarily removing overly-easy examples from the training set, dynamic snoozing can achieve up to a 3X gain in compute efficiency without any degradation in model quality.
0 points•by hdt•7 days ago