Implementing Vibe Proving with Reinforcement Learning

https://towardsdatascience.com/implementing-vibe-proving-with-rl/(towardsdatascience.com)

An approach is detailed for training a Large Language Model (LLM) to generate verifiable, step-by-step mathematical proofs. The core method utilizes a reinforcement learning (RL) loop where a custom-built proof checker verifies the LLM's output and provides a binary reward signal. This process involves bootstrapping a dataset, fine-tuning an open-source model using LoRA, and running the training on the Tinker platform. While the fine-tuned models show success on many examples, they still struggle with more complex textbook proofs. The modular nature of the system allows for future improvements in areas like model choice, prompt optimization, and dataset curation.

0 points•by ogg•1 month ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?