0
Function Calling and Agentic AI in 2025: What the Latest Benchmarks Tell Us About Model Performance
https://www.klavis.ai/blog/function-calling-and-agentic-ai-in-2025-what-the-latest-benchmarks-tell-us-about-model-performance(www.klavis.ai)Function calling and agentic AI performance is evaluated using specialized benchmarks that surpass traditional metrics. The Berkeley Function Calling Leaderboard (BFCL) assesses capabilities like multi-step reasoning and tool selection, with models like GLM-4.5 and Claude 4.1 showing strong results. A more rigorous benchmark, MCPMark, stress-tests models in realistic, multi-step workflows, revealing significant performance gaps. On MCPMark, GPT-5 leads but only achieves a 52.6% success rate, indicating that even top models struggle with complex, real-world agentic tasks.
0 points•by chrisf•9 hours ago