0

Smol2Operator: Post-Training GUI Agents for Computer Use

https://huggingface.co/blog/smol2operator(huggingface.co)
A lightweight vision-language model can be trained to acquire skills for Graphical User Interface (GUI) automation and evolve into an agentic coder. The process uses a two-phase training strategy on the SmolVLM2-2.2B-Instruct model, first instilling perception and grounding capabilities, then adding agentic reasoning via Supervised Fine-Tuning. A significant part of the work involves transforming heterogeneous data from multiple sources into a unified action space for consistent training. All training recipes, data-processing tools, the resulting model, and datasets are released to enable full reproducibility and foster further research.
0 pointsby ogg1 month ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?