Nemotron ColEmbed V2: Raising the Bar for Multimodal Retrieval with ViDoRe V3’s Top Model

https://huggingface.co/blog/nvidia/nemotron-colembed-v2(huggingface.co)

NVIDIA has introduced the Nemotron ColEmbed V2 family, a set of late-interaction embedding models in 3B, 4B, and 8B sizes designed for highly accurate multimodal retrieval. These models adopt a multi-vector, late-interaction architecture inspired by ColBERT, enabling fine-grained similarity matching between query and document tokens across text and images. The models were trained using a bi-encoder architecture with contrastive learning on text and text-image pairs to maximize relevance. The Nemotron ColEmbed V2 models have achieved state-of-the-art performance, with the 8B version ranking first on the ViDoRe V3 benchmark for visual document retrieval in enterprise use cases.

0 points•by hdt•4 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?