0

Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks

https://huggingface.co/blog/open-asr-leaderboard(huggingface.co)
The Open ASR Leaderboard has been updated with multilingual and long-form transcription tracks to benchmark the growing number of automatic speech recognition (ASR) models. Models combining Conformer encoders with LLM decoders currently achieve the best accuracy on English transcription, though at the cost of slower inference. For faster throughput, CTC and TDT decoders are significantly more efficient, making them ideal for real-time or batch processing tasks. There is a distinct trade-off between multilingual capabilities, where models like Whisper excel, and the superior performance of specialized models fine-tuned for a single language. While closed-source systems currently lead in long-form transcription, open-source models show significant potential for future innovation.
0 pointsby will2213 hours ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?