0
SEAL Showdown: Insights from GPT-5
https://scale.com/blog/seal-showdown-insights-gpt-5(scale.com)Scale AI has added several new models, including GPT-5 and Claude 4.5, to its Showdown leaderboard for evaluation. A surprising finding is that users consistently rank GPT-5 significantly lower than other leading models. Further analysis suggests that GPT-5's performance degrades as its allocated "thinking effort" increases, and it does not show significant performance improvements on coding prompts in a chat setting. These results highlight a potential disparity between performance on capability benchmarks and user preferences in real-world chat environments.
0 points•by chrisf•13 hours ago