Reproducing Variance: Caching in Agentic LLM Pipelines

https://www.ai21.com/blog/caching-in-agentic-llm-pipelines/(www.ai21.com)

Caching in agentic LLM pipelines is challenging because traditional caching is deterministic, while LLMs are non-deterministic. This conflict is especially apparent in agent evaluations, which require both reproducibility for experiments and variance for techniques like best-of-N sampling. The proposed solution involves designing a sophisticated cache key that encodes each LLM call's specific position within the pipeline's structure. This approach makes the cache resistant to changes in execution order and supports the contradictory needs for both reproducibility and variance. The result enables cleaner experiments, A/B testing of individual components, and effective best-of-N inference without different branches collapsing into the same answer.

0 points•by will22•1 hour ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?