0
Nemotron-Personas-Brazil: Co-Designed Data for Sovereign AI
https://huggingface.co/blog/nvidia/nemotron-personas-brazil(huggingface.co)Nemotron-Personas-Brazil is an open, commercially usable dataset of 6 million fully synthetic personas grounded in official Brazilian census and labor data. The dataset was created to address the lack of high-quality, Portuguese-language training data and support the development of sovereign AI for Brazil. Built using NVIDIA's NeMo Data Designer, it provides personas with attributes like age, occupation, location, and cultural context, all written in natural Brazilian Portuguese. While statistically aligned with real population distributions, the dataset is private by design and contains no personally identifiable information.
0 points•by ogg•1 day ago