Planning Actions, Not Predicting Tokens

https://www.ai21.com/blog/ai-reasoning-planning-vs-predicting/(www.ai21.com)

Current AI reasoning models, often called Large Reasoning Models (LRMs), are suboptimal because they explore chains of tokens rather than planning actions. This token-based approach, even with Chain-of-Thought (CoT) methods, is inefficient, struggles with robust generalization, and lacks the transparency and control needed for enterprise applications. A proposed alternative is to shift from predicting tokens to planning in the space of actions, where an action is a specific tool invocation like an LLM prompt or a database call. This method uses decision-theoretic planning to find the best sequence of actions, considering both answer quality and costs like time and compute. Such a planning-based system provides an outer loop of control around inner loop tools like LLMs, enabling better performance, predictability, and user interaction.

0 points•by will22•3 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?