Large language models (LLMs) have made significant advances in mathematical reasoning and theorem proving, but they face considerable challenges in formal theorem proving using systems such as Lean and Isabelle. These systems demand rigorous derivations that adhere to strict formal specifications, posing difficulties even for advanced models such as GPT-4. The primary challenge lies in the model’s need to simultaneously understand the syntax and semantics of formal systems while aligning abstract mathematical reasoning with accurate formal representations. This complex task requires a deep understanding of the intricacies of mathematical coding and concepts, creating a significant hurdle for current ai systems in producing complex formal proofs.
DeepSeek-ai researchers presented DeepSeek Tester V1.5a unified approach that combines the strengths of full-proof and test-step generation techniques through a robust truncation and resumption mechanism. This method starts with full-proof generation, where the language model produces a complete proof code based on the theorem statement. This code is then verified by the Lean prover. If an error is detected, the code is truncated at the first error message and the correctly generated part serves as a notice for the next test segment. The last state of the Lean prover is appended as a comment to the notice to improve accuracy. The truncation and resumption mechanism is integrated into Monte Carlo Tree Search (MCTS), allowing flexible truncation points determined by the tree search policy. Furthermore, a reward-free exploration algorithm is proposed to address the reward scarcity problem in proof search, by assigning intrinsic motivation to the tree search agent for extensive exploration of the tactical state space.
This study presents the following contributions:
• Pretraining: Base model enhanced with additional training on math and code data, focusing on formal languages such as Lean, Isabelle, and Metamath.
• Supervised Fine-Tuning: Lean 4 code completion dataset improved through two data augmentation techniques:
1. DeepSeek-Coder V2 236B was used to add natural language chain of thought comments.
2. Inserted intermediate tactic status information into the Lean 4 test code.
• Reinforcement learning: The GRPO algorithm was employed for reinforcement learning from test assistant feedback (RLPAF), using the results of the Lean tester verification as rewards.
• Monte Carlo tree search: advanced tree search method with:
1. Truncate-resume mechanism as a state-action abstraction.
2. RMaxTS algorithm, which uses the RMax strategy for exploration in search of sparse reward evidence.
3. Intrinsic rewards were assigned to encourage diverse planning paths and broad exploration of the test space.
DeepSeek-Prover-V1.5 demonstrates significant progress in formal theorem proving across multiple benchmarks. On the miniF2F test dataset, DeepSeek-Prover-V1.5-RL achieved a 60.2% pass rate in a single-pass full-proof generation, marking a 10.2 percentage point improvement over its predecessor. With a limited sampling budget of 128 attempts, it proved 51.6% of the problems, outperforming other full-proof generation methods and matching the leading tree search methods. When enhanced with RMaxTS tree search, DeepSeek-Prover-V1.5-RL achieved a state-of-the-art pass rate of 62.7%. Furthermore, it outperformed the previous best result with significantly fewer samples. On the ProofNet dataset, DeepSeek-Prover-V1.5-RL achieved passing rates of 22.6% and 25.3% in single-pass and RMaxTS-enhanced configurations respectively, outperforming existing methods. These results demonstrate the superior performance of DeepSeek-Prover-V1.5 on different theorem proving tasks and methodologies.
DeepSeek Tester V1.5a 7 billion parameter language model, sets new benchmarks in formal theorem proving using Lean 4. Built on DeepSeek-Prover-V1.5-Base, it undergoes specialized pre-training, end-to-end supervised fine-tuning, and reinforcement learning via GRPO. The model incorporates RMaxTS, an innovative Monte Carlo tree search variant, to improve problem solving through exhaustive exploration. This framework establishes a similar line of work to AlphaZero for formal theorem proving, using expert iteration and synthetic data. While the current focus is on exploration, future developments may include a critique model for evaluating incomplete proofs, addressing the exploitation aspect of reinforcement learning in theorem proving.
Take a look at the Paper and ai/DeepSeek-Prover-V1.5″ target=”_blank” rel=”noreferrer noopener”>GitHub. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our fact sheet..
Don't forget to join our Subreddit with over 48 billion users
Find upcoming ai webinars here
Asjad is a consultant intern at Marktechpost. He is pursuing Bachelors in Mechanical Engineering from Indian Institute of technology, Kharagpur. Asjad is a Machine Learning and Deep Learning enthusiast who is always researching the applications of Machine Learning in the healthcare domain.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>