Real Speech Breaks AI (And What We're Doing to Fix It)

https://scale.com/blog/audiomc(scale.com)

Even the most advanced AI systems struggle with the messy reality of human speech, which is full of hesitations, interruptions, and mid-sentence corrections. To address this, the new Audio MultiChallenge benchmark was created to stress-test the conversational intelligence of native speech-to-speech models. It evaluates key abilities like handling voice edits, understanding audio-only cues, and maintaining consistency, revealing that models perform significantly worse on real human speech than on clean, synthetic data. Initial tests showed Google’s Gemini models lead in conversational robustness, but even the best native audio models currently trade off some reasoning "IQ" for speed and

0 points•by hdt•6 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?