EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios

https://huggingface.co/blog/ServiceNow-AI/eva-bench-data(huggingface.co)

EVA-Bench is a benchmark dataset designed to evaluate voice agents, addressing how their performance can fail in domain-specific ways. This new version expands from one enterprise domain to three: Airline Customer Service Management (CSM), Enterprise IT Service Management (ITSM), and Healthcare HR Service Delivery (HRSD). The updated benchmark spans 213 evaluation scenarios across 121 tools. This expansion aims to better test an agent's ability to adapt to different vocabularies, workflow complexities, and user expectations.

0 points•by will22•1 month ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?