Community Evals: Because we're done trusting black-box leaderboards over the community

https://huggingface.co/blog/community-evals(huggingface.co)

Current AI model evaluation is facing challenges, as high scores on benchmarks like MMLU do not always translate to real-world capability. To address this, Hugging Face is launching "Community Evals," a decentralized and transparent reporting system on its Hub. This feature allows dataset repositories to host leaderboards, models to store their own evaluation scores, and any user to submit results for a model via a pull request. The goal is to make the evaluation process more visible and reproducible by exposing what was evaluated, how, when, and by whom.

0 points•by chrisf•22 hours ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?