0
DiffuJudge-AV: A Diffusion-Inspired Framework for Calibrated AV Video Evaluation
https://towardsdatascience.com/diffujudge-av-a-diffusion-inspired-framework-for-calibrated-av-video-evaluation/(towardsdatascience.com)Evaluating autonomous driving systems with large language models can be dangerously misleading, as a model might appear accurate while failing to reliably flag critical safety issues. A new framework called DiffuJudge-AV addresses this by treating the AI judge like a noisy sensor and deliberately stress-testing it with known biases to measure its stability. Using a diffusion-inspired denoising technique, the system calculates not just a more accurate score but also a calibrated uncertainty level for each evaluation. In a surprising result on a driving video benchmark, a 7-billion parameter open-source vision model outperformed a much larger closed model on the safety-critical metrics that truly matter. This approach enables the system to confidently decide whether to automatically escalate a failure, approve a scenario,
0 points•by will22•14 hours ago