Vision Language Model Alignment in TRL ⚡️

https://huggingface.co/blog/trl-vlm-alignment(huggingface.co)

New alignment methods for Vision Language Models (VLMs) have been added to the TRL library, offering advanced alternatives to Direct Preference Optimization (DPO). The techniques include Mixed Preference Optimization (MPO), which uses a combined loss to improve reasoning, and Group Relative Policy Optimization (GRPO), which updates policies over groups of responses for greater robustness. These methods are designed to extract more signal from preference data and scale better with modern VLMs. The integration includes training scripts and code examples for implementing these new alignment strategies.

0 points•by ogg•11 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?