SEAL Showdown: Insights from GPT-5

https://scale.com/blog/seal-showdown-insights-gpt-5(scale.com)

Scale AI has added several new models, including GPT-5 and Claude 4.5, to its Showdown leaderboard for evaluation. A surprising finding is that users consistently rank GPT-5 significantly lower than other leading models. Further analysis suggests that GPT-5's performance degrades as its allocated "thinking effort" increases, and it does not show significant performance improvements on coding prompts in a chat setting. These results highlight a potential disparity between performance on capability benchmarks and user preferences in real-world chat environments.

0 points•by chrisf•8 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?