Large language models, such as PaLM, Chinchilla and ChatGPT, have opened up new possibilities in performing Natural Language Processing (NLP) tasks from reading instructional keys. The prior art has shown that instruction tuning, which involves fine-tuning language models on various instruction-organized NLP tasks, further enhances the ability of language models to perform an unknown task given an instruction. By comparing their fine-tuning procedures and strategies, they evaluate the approaches and results of open source instruction generalization initiatives in this paper.
This paper focuses on the details of instruction fitting methods, eliminating individual factors and comparing them directly. They identify and evaluate the critical methodological improvements in the “Flan 2022 Collection”, which is the term they use for the data collection and the methods that are applied to the process of fitting data and instructions that focuses on the emerging and the most advanced . results of combining Flan 2022 with PaLM 540B. The Flan 2022 Collection contains the most comprehensive collection of works and techniques for instruction setting that is currently publicly available. It has been augmented with thousands of premium templates and better formatting patterns.
They show that, across all evaluated evaluation benchmarks, a model trained on this collection outperforms other public collections, including the original 2021 Flan, T0++, Super-Natural Instructions, and contemporary work in OPT-IML. This includes, for models of identical size, improvements of more than 4.2% and 8.5% in the MMLU and BIG-Bench Hard evaluation benchmarks. Based on an analysis of the Flan 2022 approach, the strong results are due to the largest and most varied collection of tasks and several simple strategies for data tuning and augmentation. In particular, multi-instance training with zero-shot, few-shot, and chain-of-thought prompt templates improves performance in all of these contexts.
For example, a 10% increase in low shot orders improves the results of zero shot orders by 2% or more. In addition, balancing task sources and improving task variety by inverting input-output pairs, as was done, have been shown to be essential for performance. In single-task fine tuning, the resulting Flan-T5 model converges faster and performs better than T5 models, indicating that instruction-tuned models provide a more computationally effective starting point for downstream applications. They anticipate that making these results and tools openly accessible will optimize the resources available for instructional adaptation and accelerate the development of more general purpose language models.
The main contributions of this study are listed below: • Methodological: demonstrate that training with a combination of zero signals and few triggers produces significantly superior results in both settings. • Measure and demonstrate key methods for efficient instruction tuning, including scaling from Section 3.3, improving task diversity through input inversion, adding chain-of-thought training data, and balancing of various data sources. • Results: These technical decisions improve backlog performance by 3-17% compared to available open source instruction tuning collections. • Findings: Flan-T5 XL provides a more robust and effective computational starting point for single task fine tuning. • Make the new Flan 2022 collection of tasks, templates and research methodologies available to the public. The source code is available on GitHub.
review the Paper and Github. here is a cool article for more information on the comparison. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 13k+ ML SubReddit, discord channel, and electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.