Large language models (LLMs) have ushered in a new era in the field of artificial intelligence (ai) through their exceptional natural language processing capabilities. From mathematical reasoning to code generation and even writing legal opinions, LLMs find their applications in almost every field. To align the performance of such models with desirable behavior, they are fine-tuned using techniques such as supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF). However, the problem is that these methods require a significant volume of human-annotated data, making the process time-consuming and resource-intensive.
In this research work, UCLA researchers have attempted to boost a weak LLM to improve its performance without requiring additional human-annotated data. They have introduced a novel adjustment method called Autoplay fine tuning (SPIN)allowing the model to engage in self-play, that is, “playing” against itself without requiring any direct supervision.
There has been previous work to address this problem, such as using synthetic data with binary feedback in self-training and employing a weak model to guide the stronger one. SPIN, however, is a more efficient approach that eliminates the need for human binary feedback and works effectively with a single LLM.
The entire process could be seen as a two-player game in which the first model generates answers as close as possible to those in the human-annotated data set, and the second model attempts to distinguish between the other model's answers and those generated by humans. . answers. The latter is obtained by adjusting the former to prefer the responses of the target data set to the response generated by the first model. In the next iteration, the models change their roles (generating responses and discerning them), and the process continues until the iteration where the LLM cannot differentiate between the response generated by its previous version and those generated by the human.
The authors demonstrated the effectiveness of SPIN using an example. When an LLM was asked to list the popular forms of transport in Southampton, at iteration zero the model began to trip and provided an incorrect distribution of transport modes. However, in the next step, he gave an answer that aligned more closely with the fundamental truth.
The researchers used the zephyr-7b-sft-complete to evaluate the framework. The model was derived from the pre-trained Mistral-7B and further refined on an SFT data set. The base model was used to generate synthetic responses on 50,000 messages randomly sampled from the data set. The results show that SPIN improved the average model score by 2.66% in iteration 0. In the next iteration, the LLM model from the previous iteration was used to generate new answers for SPIN, which further improved the score average by 1.32%.
In conclusion, SPIN is a novel framework that converts a weak LLM into a strong one without the need for an expert human annotator. Using an autoplay mechanism, he was able to significantly improve the performance of a model fitted on an SFT data set. However, there are some limitations to their approach, which puts a limit on the performance of adjusted LLM. However, this problem could be solved by dynamically changing the target data distribution, and the researchers left this issue for future work.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to join. our SubReddit of more than 35,000 ml, 41k+ Facebook community, Discord Channel, LinkedIn Grabove, Twitterand Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you'll love our newsletter.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. His most recent endeavor is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has more than 2 million monthly visits, which illustrates its popularity among the public.
<!– ai CONTENT END 2 –>