The field of natural language processing (NLP) has witnessed significant advancements with the emergence of large language models (LLM) such as GPT and LLaMA. These models have become essential tools for various tasks, leading to a growing need for proprietary LLMs among individuals and organizations. However, the resource-intensive nature of developing an LLM remains a challenge for many. Researchers have proposed knowledge fusion from LLMs as an alternative approach to building powerful models while reducing development costs. This method combines multiple LLMs into a unified framework to leverage their strengths across different tasks.
Previous attempts to integrate multiple models have relied on ensemble methods or direct fusion of neural networks. While effective, these approaches often encounter inefficiencies during inference or require uniform network architectures for fusion. FUSELLM introduced a novel paradigm for knowledge fusion, using probability distribution matrices generated by LLMs from multiple sources to transfer collective knowledge to a target LLM through lightweight continuous training. This methodology allows the fusion of pre-trained LLMs with various architectures into a cohesive model.
Expanding on the principles of FUSELLM, the studio introduces FUSECHAT, specifically designed to merge chat LLMs with different architectures and scales. FUSECHAT is developed in two main stages: fusion of source LLM knowledge with different structures and scales and fusion within the parameter space to incorporate collective knowledge from the source models. The method introduces VARM (Variation Ratio Merge), a novel approach to determine the combination of weights based on the variation ratio of the parameter matrices before and after fine-tuning. This allows detailed fusion without additional training efforts.
Empirical evaluation of FUSECHAT using representative open source chat LLMs demonstrates its effectiveness. Results from MT-Bench, a benchmark that assesses multi-turn dialogue ability, indicate that FUSECHAT outperforms single-source LLMs and fine-tuned baselines at different scales. In particular, the proposed VARM fusion method achieves superior performance, highlighting the effectiveness of fusing weights based on variation indices. With its scalability and flexibility, FUSECHAT presents a promising solution for integrating chat models amidst the changing landscape of open source LLM development.
The development of FUSECHAT represents a significant advance in the field of multi-model LLM integration, particularly in the realm of chat-based applications. By leveraging knowledge fusion techniques, FUSECHAT offers a practical and efficient approach to combining the capabilities of various chat LLMs, addressing the challenges of resource-intensive model development. Its ability to seamlessly integrate models with different architectures and scales, along with the effectiveness of the VARM fusion method, positions FUSECHAT as a versatile tool for improving the performance of dialog systems. As demand for sophisticated chat-based ai systems continues to grow, FUSECHAT is poised to be instrumental in driving innovation and advancements in this space.
Review the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on Twitter and Google news. Join our 38k+ ML SubReddit, 41k+ Facebook community, Discord channeland LinkedIn Grabove.
If you like our work, you will love our Newsletter..
Don't forget to join our Telegram channel
You may also like our FREE ai Courses….
Arshad is an intern at MarktechPost. He is currently pursuing his international career. Master's degree in Physics from the Indian Institute of technology Kharagpur. Understanding things down to the fundamental level leads to new discoveries that lead to the advancement of technology. He is passionate about understanding nature fundamentally with the help of tools such as mathematical models, machine learning models, and artificial intelligence.
<!– ai CONTENT END 2 –>