Math Provers 🧮

Aug 16, 2024

So essentially,

DeepSeek-Prover-V1.5 has achieved state-of-the-art results on the miniF2F (high school level) and ProofNet (undergraduate level) theorem proving

Paper: DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search (28 Pages)

Github: https://github.com/deepseek-ai/DeepSeek-Prover-V1.5

Researchers from DeepSeek AI are introducing models which are good at math proofs.

Hmm..What’s the background?

While language models have shown success in natural language processing, they struggle with the rigorous demands of formal theorem proving, which requires strict adherence to formal logic and proof systems. Even advanced models like GPT-4 struggle with complex formal proofs.

Source: https://lexica.art/prompt/2d3daba0-efb0-484b-b8d2-2218b4e932a3

Ok, So what is proposed in the research paper?

This model uses a novel "truncate-and-resume" mechanism to integrate intermediate tactic state information into the whole-proof generation process. This allows for more effective use of feedback from the Lean prover.

DeepSeek-Prover-V1.5 undergoes a multi-stage training process:
1. Pre-training: It is initially pre-trained on a massive dataset of mathematical text and code, focusing on formal languages like Lean, Isabelle, and Metamath
2. Supervised Fine-Tuning: The model is fine-tuned on a dataset of Lean code, augmented with natural language explanations and intermediate tactic state information
3. Reinforcement Learning: It is further refined using reinforcement learning from feedback provided by the Lean 4 prover
Monte-Carlo Tree Search (RMaxTS): To improve its long-term planning abilities, DeepSeek-Prover-V1.5 utilizes a variant of Monte-Carlo Tree Search called RMaxTS

Remarkably, DeepSeek-Prover-V1.5 has achieved state-of-the-art results on the miniF2F (high school level) and ProofNet (undergraduate level) theorem proving benchmarks. Notably, it demonstrates strong performance in both single-pass and tree search settings, demonstrating the effectiveness of its hybrid approach.

What’s next?

The researchers emphasize that effectively addressing the challenges of long-term planning, reward sparsity, and credit assignment in theorem proving remains an open and active area of research.

So essentially,

DeepSeek-Prover-V1.5 has achieved state-of-the-art results on the miniF2F (high school level) and ProofNet (undergraduate level) theorem proving

Learned something new? Consider sharing with your friends!

Share So Essentially

So Essentially

Math Provers 🧮

Discussion about this post