Claude Fable Sets High Score on BU Bench

https://browser-use.com/posts/claude-fable-browser-agent-benchmark(browser-use.com)

Anthropic's Claude Fable 5 model achieved a top score of 80% on the BU Bench V1, a benchmark designed to test AI agents on real-world web tasks. This performance surpassed the next-highest model, GPT 5.5, by 12 points, but came at a significantly higher API cost of over $580. The model demonstrated a strong ability to handle complex, multi-step tasks requiring reasoning and constraint checking, showing it was better at continuing tasks for longer periods. While more expensive, Fable's failures were less frequent and less simplistic than those of competing models. The benchmark results highlight a trade-off between performance, reliability, and cost for advanced browser automation agents.

0 points•by hdt•1 month ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?