Multi-task fine-tuning (MT), also known as instruction-tuned language models (LM), has recently demonstrated the ability to generalize unseen tasks. It was once believed that increasing the total number of training tasks was the most important factor in improving the performance of LM MTs in invisible task generalization.
However, when it comes to invisible tasks, a new study by KAIST, LG AI Research, and the University of Illinois found that an LM trained on a single task could outperform an LM trained on more than 300 tasks. For each unseen task, the researchers propose a simple expert recovery (RoE) technique using a commercially available dense retriever and training experts with T5-3B as the underlying LM.
Instead of fine-tuning instructions across the board, researchers specifically train expert LMs for each given training task (296) by freezing the underlying LM and updating adapters. They used the same experimental design (training and testing) as T0-3B, one of the most popular MT LMs. Their findings show that 7 out of 296 experts outperform T0-3B in terms of their ability to generalize tasks with unknown mean precision. Across 11 unknown data sets and 13 BIG-Bench benchmark data sets, using the highest performing expert outperforms T0-3B with a mean accuracy of 3.20% and 1.29%, respectively.
Furthermore, they demonstrate that T0-3B level performance can be achieved using a simple approach to retrieve applicable experts for each unseen job. These findings suggest that choosing the right expert rather than naively using a single MT LM for all unseen tasks may be more efficient and effective, given the substantial room for improvement in retrieving the best performing expert for each unseen task (+ 11.94% compared to T0-3B).
As mentioned in your article, the proposed method offers the following three advantages over instruction wrapping:
- RoE is more resistant to negative task transfer that often occurs during instruction tuning by showing that it outperforms T0-3B and T0-11B, respectively, in terms of mean accuracy across the 36 observed tasks by +10.40% and +7.70%, respectively.
- The proposed distributed method allows continuous learning of new tasks without affecting the performance of previously observed ones. However, LMs who are instructed to perform certain tasks will need to practice them repeatedly as they continue to learn.
- Instruction-adjusted LMs have a low capacity for instruction formulation. The distributed method allows the composition of expert skills by combining the efforts of multiple experts (abstract and translation).
The team hopes their work will encourage the research community to delve deeper into the topic of dispersed and collaborative expert training, which could have future benefits such as increased efficiency, privacy, and personalization.
review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 14k+ ML SubReddit, discord channel, and electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech at the Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a strong interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new advances in technology and its real life application.