TextQuests: How Good are LLMs at Text-Based Video Games?

https://huggingface.co/blog/textquests(huggingface.co)

TextQuests is a new benchmark for evaluating Large Language Models (LLMs) as autonomous agents within dynamic, interactive settings. It utilizes 25 classic text-based video games to test an agent's long-context reasoning and ability to learn through exploration without external tools. Evaluations reveal that current LLMs struggle with long-context tasks, often hallucinating past events, failing at spatial reasoning, and repeating actions instead of creating new plans. The benchmark also measures agent efficiency, finding that while more computation can improve performance, there are diminishing returns, indicating a need for more dynamic reasoning.

0 points•by will22•10 months ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?