LatentVLA: Latent Reasoning Models for Autonomous Driving

https://towardsdatascience.com/latentvla-latent-reasoning-models-for-autonomous-driving/(towardsdatascience.com)

LatentVLA is an autonomous driving model that performs reasoning in the latent space, contrasting with approaches that rely on natural language. It uses a self-supervised framework to learn discrete 'ego-actions' from unlabeled driving data by separating driver actions from environmental dynamics. A large Vision-Language Model (VLM) is trained to predict these latent action sequences, using a very small action codebook to preserve pre-trained knowledge. To achieve real-time performance, knowledge distillation is employed to transfer the VLM's capabilities to a much smaller decision transformer. While LatentVLA achieves state-of-the-art results on simulation benchmarks, the evaluation also discusses the limitations of open-loop planning for assessing true driving capabilities.

0 points•by chrisf•4 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?