Small Yet Mighty: Improve Accuracy In Multimodal Search and Visual Document Retrieval with Llama Nemotron RAG Models

https://huggingface.co/blog/nvidia/llama-nemotron-vl-1b(huggingface.co)

Two small Llama Nemotron models are introduced for improving multimodal search and visual document retrieval in RAG pipelines. The models include `llama-nemotron-embed-vl-1b-v2` for creating single-vector embeddings from images and text, and `llama-nemotron-rerank-vl-1b-v2` for reordering retrieved results for better relevance. Designed for enterprise use, these models are small enough for common GPU resources and compatible with standard vector databases, aiming to ground generation in better evidence to reduce hallucinations. Performance benchmarks on datasets like ViDoRe and DigitalCorpora-10k demonstrate that the combination of the embedding and reranking models significantly improves retrieval accuracy over previous and competing models.

0 points•by hdt•6 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?