0
Smol2Operator: Post-Training GUI Agents for Computer Use
https://huggingface.co/blog/smol2operator(huggingface.co)A lightweight vision-language model can be trained to acquire skills for Graphical User Interface (GUI) automation and evolve into an agentic coder. The process uses a two-phase training strategy on the SmolVLM2-2.2B-Instruct model, first instilling perception and grounding capabilities, then adding agentic reasoning via Supervised Fine-Tuning. A significant part of the work involves transforming heterogeneous data from multiple sources into a unified action space for consistent training. All training recipes, data-processing tools, the resulting model, and datasets are released to enable full reproducibility and foster further research.
0 points•by ogg•1 month ago