Better Harness: A Recipe for Harness Hill-Climbing with Evals

https://blog.langchain.com/better-harness-a-recipe-for-harness-hill-climbing-with-evals/(blog.langchain.com)

A method for improving AI agents involves iteratively refining their surrounding system, or "harness," through a process called hill-climbing that uses evaluations (evals) as the learning signal. This system, named Better-Harness, treats evals as training data to guide edits to an agent's prompts and tools, analogous to a classical machine learning training loop. The process includes sourcing evals from production traces, splitting data into optimization and holdout sets to ensure generalization, and using an autonomous loop with human review to implement changes. This approach systematically diagnoses failures from traces and validates that improvements do not cause regressions on other tasks. Experimental results demonstrate that this technique successfully improves agent capabilities like tool selection and response quality across different models.

0 points•by ogg•3 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?