Large language models have shown remarkable performance in a wide variety of tasks. From producing unique and creative content and questioning answers to translating languages and summarizing verbatim paragraphs, LLMs have managed to imitate humans. Some well-known LLMs like GPT, BERT, and PaLM have made headlines for following instructions precisely and accessing large amounts of high-quality data. Models like GPT4 and PaLM are not open source, preventing anyone from understanding their architectures and training data. On the other hand, the open source nature of LLMs such as Pythia, LLaMA, and Flan-T5 gives researchers the opportunity to tune and improve models on custom training data sets. This allows the development of smaller and more efficient LLMs such as Alpaca, Vicuna, OpenAssistant and MPT.
There is no single open source LLM that leads the market, and the best LLMs for various examples can differ greatly from one another. Therefore, to continually produce better answers for each input, it is essential to dynamically assemble these LLMs. Bias, errors, and uncertainties can be reduced by integrating the distinctive contributions of various LLMs, resulting in results that are more in line with human preferences. To address this, researchers from the Allen Institute for Artificial Intelligence, the University of Southern California, and Zhejiang University proposed LLM-BLENDER, a joint framework that consistently outperforms by utilizing the many advantages of various open source large language models.
LLM-BLENDER consists of two modules: PAIRRANKER and GENFUSER. These modules show that the optimal LLM for different examples can vary significantly. PAIRRANKER, the first module, has been developed to identify small variations between potential results. It uses an advanced pairwise comparison technique in which the original text and two candidate outputs from various LLMs act as inputs. To code the input and the candidate pair together, it uses cross-attention coders such as RoBERTa, where PAIRRANKER can determine the quality of the two candidates using this encoding.
The second module, GENFUSER, focuses on merging the top ranked candidates to generate an improved result. Make the most of the advantages of the chosen candidates while minimizing their disadvantages. GENFUSER aims to develop a result that is superior to the result of any LLM by merging the results of several LLMs.
For evaluation, the team provided a reference dataset called MixInstruct, which incorporates Oracle pairwise comparisons and combines multiple instruction datasets. This dataset uses 11 popular open source LLMs to generate multiple candidates for each input in various instruction-following tasks. It includes training, validation and testing examples with Oracle comparisons for automatic evaluation. These Oracle comparisons have been used to give candidate results a true ranking, allowing the performance of LLM-BLENDER and other benchmark techniques to be evaluated.
Experimental findings have shown that LLM-BLENDER performs much better on a variety of assessment parameters than individual LLMs and reference techniques. It establishes a considerable performance gap and shows that the use of the LLM-BLENDER assembly methodology results in higher quality results compared to the use of a single LLM or reference method. PAIRRANKER selections have outperformed individual LLM models due to their better performance in GPT-Rank and referral-based metrics. Through efficient merging, GENFUSER significantly improves response quality by using the best PAIRRANKER options.
LLM-BLENDER has also outperformed individual LLMs, such as Vicuña, and has therefore shown great potential to improve LLM deployment and research through co-learning.
review the Paper, Project, and Github. Don’t forget to join our 24k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
featured tools Of AI Tools Club
🚀 Check out 100 AI tools at AI Tools Club
Tanya Malhotra is a final year student at the University of Petroleum and Power Studies, Dehradun, studying BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.