DeepSeek has recently released its latest open-source model of Hugging Facel, DeepSeek-V2-Chat-0628. This release marks a significant advancement in ai-powered text generation and chatbot technology capabilities, positioning DeepSeek at the forefront of the industry.
DeepSeek-V2-Chat-0628 is an upgraded version of the previous DeepSeek-V2-Chat model. This new version has been meticulously refined to deliver superior performance on various benchmarks. According to the LMSYS Chatbot Arena ranking, DeepSeek-V2-Chat-0628 has achieved an impressive overall ranking of #11, outperforming all other open source models. This achievement underscores DeepSeek’s commitment to advancing the field of artificial intelligence and providing top-notch solutions for conversational ai applications.
The improvements in DeepSeek-V2-Chat-0628 are extensive and cover several critical aspects of the model’s functionality. Notably, the model shows substantial improvements in several benchmarks:
- Human evaluation: The score improved from 81.1 to 84.8, reflecting an increase of 3.7 points.
- MATH: A notable jump from 53.9 to 71.0, indicating an improvement of 17.1 points.
- Good luck: The performance score increased from 79.7 to 83.4, an improvement of 3.7 points.
- IFEF Evaluation: A significant increase from 63.8 to 77.6, an improvement of 13.8 points.
- Arena-Hard: The most dramatic improvement was seen, with an increase of 26.7 points, going from 41.6 to 68.3.
- JSON output (internal): Improved from 78 to 85, showing a 7-point improvement.
The DeepSeek-V2-Chat-0628 model also features optimized instruction-following capabilities within the “system” area, significantly improving the user experience. This optimization benefits tasks such as immersive translation and retrieval augmented generation (RAG), providing users with more intuitive and efficient interaction with ai.
For those interested in deploying DeepSeek-V2-Chat-0628, the model requires 80GB*8 GPU for inference in BF16 format. Users can use Huggingface Transformers for model inference, which involves importing the necessary libraries and configuring the model and tokenizer with the appropriate settings. Compared to previous versions, the entire chat template has been updated, improving the response generation and interaction capabilities of the model. The new template includes specific formatting and token settings that ensure more accurate and relevant results based on user input.
vLLM is recommended for model inference as it offers a simplified approach to integrating the model into multiple applications. Setting up vLLM involves merging a pull request into the vLLM codebase and configuring the model and tokenizer to handle the desired tasks efficiently.
The DeepSeek-V2-Chat-0628 model is available under the MIT license for the code repository, and the model itself is subject to the model license. This enables commercial use of the DeepSeek-V2 series, including the Base and Chat models, making it accessible to businesses and developers looking to integrate advanced ai capabilities into their products and services.
In conclusion, the release of DeepSeek-V2-Chat-0628 for DeepSeek demonstrates its ongoing dedication to ai innovation. With impressive performance metrics and an improved user experience, this model is poised to set new standards in conversational ai.
Review the ai/DeepSeek-V2-Chat-0628″ target=”_blank” rel=”noreferrer noopener”>Model card and Assignment of functions. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our Newsletter..
Don't forget to join our Subreddit with over 46 billion users
Find upcoming ai webinars here
Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary engineer and entrepreneur, Asif is committed to harnessing the potential of ai for social good. His most recent initiative is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has over 2 million monthly views, illustrating its popularity among the public.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>