0
EMO: Pretraining mixture of experts for emergent modularity
https://huggingface.co/blog/allenai/emo(huggingface.co)A new mixture-of-experts (MoE) model called EMO is pretrained for emergent modularity without relying on predefined domains. This architecture allows for the selective use of a small subset of experts for specific tasks, such as math or code, while retaining near full-model performance. The training process uses document boundaries as a supervisory signal, forcing all tokens within a document to route to a shared pool of experts. This encourages the experts to specialize in coherent, high-level domains rather than low-level lexical patterns, enabling more flexible and efficient model deployment.
0 points•by will22•1 day ago