DeepSeek-AI launches DeepSeek-R1-Zero and DeepSeek-R1: first-generation reasoning models that boost reasoning ability in LLM through reinforcement learning

Large language models (LLMs) have made significant advances in natural language processing, excelling at tasks such as comprehension, generation, and reasoning. However, challenges remain. Achieving robust reasoning often requires extensive supervised tuning, which limits scalability and generalization. Furthermore, problems such as poor readability and balancing computational efficiency with reasoning complexity persist, leading researchers to explore new approaches.

DeepSeek-R1: A new approach to LLM reasoning

DeepSeek-ai's recent work presents DeepSeek-R1a model designed to improve reasoning abilities through reinforcement learning (RL). This effort resulted in two models:

DeepSeek-R1-Zerowhich is trained solely with RL and demonstrates emergent reasoning behaviors, such as long chain of thought (CoT) reasoning.
DeepSeek-R1which builds on its predecessor by incorporating a multi-stage training process, addressing challenges such as readability and language mixing while maintaining high reasoning performance.

These models aim to overcome existing limitations, combining innovative RL techniques with structured training processes to achieve scalability and usability.

Technical innovations and benefits

1. Reinforcement learning in reasoning tasks: DeepSeek-R1-Zero employs RL without relying on supervised data. By using Group Relative Policy Optimization (GRPO), you optimize reasoning by evaluating multiple outcomes, significantly improving benchmark performance. For example, his AIME 2024 pass@1 score increased from 15.6% to 71.0% during training.

2. Multi-stage training on DeepSeek-R1: DeepSeek-R1 ingests cold start data (thousands of curated CoT examples) to fine-tune its base model before undergoing reasoning-focused RL. This process ensures that results are consistent and easy to use by incorporating rewards for language consistency.

3. Distillation for smaller models: To address computational limitations, DeepSeek-ai distilled six smaller models (1.5 billion to 70 billion parameters) from DeepSeek-R1 using Qwen and Llama architectures. These models retain strong reasoning capabilities, and the 14B distillate model achieved a pass@1 score of 69.7% in AIME 2024, outperforming some larger models.

Results: Performance Insights

DeepSeek-R1 performance is supported by benchmark results:

Reasoning Benchmarks:
- AIME 2024: 79.8% pass@1, surpassing OpenAI's o1-mini.
- MATH-500: 97.3% pass@1, comparable to OpenAI-o1-1217.
- GPQA Diamond: 71.5% pass@1, excelling in fact-based reasoning.
Coding and STEM tasks:
- Codeforces Elo Rating: 2029, outperforming 96.3% of human participants.
- SWE-Bench Verified: 49.2% resolution rate, competitive with other leading models.
General capabilities:
- Strong generalization was demonstrated on the ArenaHard and AlpacaEval 2.0 benchmarks, achieving win rates of 92.3% and 87.6%, respectively.

Distilled Model Highlights: Smaller models such as DeepSeek-R1-Distill-Qwen-32B show strong performance, with a pass@1 score of 72.6% in AIME 2024, demonstrating effective scalability and practicality.

<h3 class="wp-block-heading" id="h-conclusion-refining-reasoning-in-ai“>Conclusion: Refine reasoning in ai

DeepSeek-ai's DeepSeek-R1 and DeepSeek-R1-Zero represent significant advancements in reasoning capabilities for LLMs. By leveraging RL, cold start data, and distillation techniques, these models address critical limitations while promoting accessibility through open source availability under the MIT license. The API ('model=deepseek-reasoner') further improves usability for developers and researchers.

Looking ahead, DeepSeek-ai plans to refine multilingual support, improve software engineering capabilities, and improve rapid sensitivity. These efforts aim to further establish DeepSeek-R1 as a robust solution for reasoning-focused ai applications. By integrating thoughtful training paradigms, DeepSeek-R1 illustrates how ai can advance to address increasingly complex challenges.

Verify he <a target="_blank" href="https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf” target=”_blank” rel=”noreferrer noopener”>Paper, <a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1″ target=”_blank” rel=”noreferrer noopener”>R1 Deep Search and <a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero” target=”_blank” rel=”noreferrer noopener”>DeepSeek R1 zero. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on <a target="_blank" href="https://x.com/intent/follow?screen_name=marktechpost” target=”_blank” rel=”noreferrer noopener”>twitter and join our Telegram channel and LinkedIn Grabove. Don't forget to join our SubReddit over 65,000 ml.

<a target="_blank" href="https://nebius.com/blog/posts/studio-embeddings-vision-and-language-models?utm_medium=newsletter&utm_source=marktechpost&utm_campaign=embedding-post-ai-studio” target=”_blank” rel=”noreferrer noopener”> (Recommended Reading) Nebius ai Studio Expands with Vision Models, New Language Models, Embeddings, and LoRA ^(Promoted)

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. Their most recent endeavor is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has more than 2 million monthly visits, which illustrates its popularity among the public.

Meet 'Height': The Only Standalone Project Management Tool (Sponsored)

DeepSeek-AI launches DeepSeek-R1-Zero and DeepSeek-R1: first-generation reasoning models that boost reasoning ability in LLM through reinforcement learning

Technical Terrence Team

Apple Stock Falls After Two Downgrades By Investing.com

Leave a Reply Cancel reply

Recommended.

Elon Musk sells X, mainly for himself

Overlord and Revolving Games team up for Web3 game

Can I opt out of Meta AI scraping on Instagram and Facebook? Something like.

Bitcoin's volatility hits 3.6% in the midst of greater market uncertainty

Battle Royale 'Off the Grid' opens early access for PS5, Xbox and PC

Categories

Important Links

DeepSeek-AI launches DeepSeek-R1-Zero and DeepSeek-R1: first-generation reasoning models that boost reasoning ability in LLM through reinforcement learning

DeepSeek-R1: A new approach to LLM reasoning

Technical innovations and benefits

Results: Performance Insights

Related

Technical Terrence Team

Apple Stock Falls After Two Downgrades By Investing.com

Leave a Reply Cancel reply

Recommended.

Elon Musk sells X, mainly for himself

Overlord and Revolving Games team up for Web3 game

Can I opt out of Meta AI scraping on Instagram and Facebook? Something like.

Bitcoin's volatility hits 3.6% in the midst of greater market uncertainty

Battle Royale 'Off the Grid' opens early access for PS5, Xbox and PC

Categories

Important Links

Get daily news updates to your inbox!