Why is ChatGPT so good?

https://scale.com/blog/chatgpt-reinforcement-learning(scale.com)

ChatGPT's high performance is largely attributed to its training process, which incorporates Reinforcement Learning from Human Feedback (RLHF). This method goes beyond standard pre-training by fine-tuning the model based on human preferences. A reward model is trained on data where humans rank different model outputs for the same prompt. This reward model is then used to guide the language model, optimizing its responses to be more helpful, harmless, and aligned with user intent.

0 points•by will22•6 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?