ScarfBench: Benchmarking AI Agents for Enterprise Java Framework Migration

https://huggingface.co/blog/ibm-research/scarfbench(huggingface.co)

ScarfBench is an open benchmark for evaluating AI agents on their ability to migrate enterprise Java applications across frameworks like Spring, Jakarta EE, and Quarkus. Unlike benchmarks that only check code, ScarfBench evaluates if the migrated application can successfully build, deploy, and pass behavioral validation tests. This provides a more realistic measure of quality for the complex task of software modernization, which involves translating framework semantics, not just source code. Initial results show that even state-of-the-art coding agents struggle with this task, achieving low success rates in preserving application behavior. This highlights a significant gap between generating compilable code and performing a functionally correct migration.

0 points•by chrisf•4 hours ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?