Large Language Models (LLM) have proven to be remarkably effective in numerous works. To fully realize the potential of the models, supervised fine-tuning (SFT) is necessary to match human instructions. A simple option when the variety of tasks increases or when performance in a particular activity needs to be improved is to increase the amount of data, even though some work has shown that models can follow human instructions successfully with a small adjustment to the data. .
Several studies show that the significant growth of data fitting presents new difficulties. In particular, researchers have found that performance decreases significantly with significant increases in data fit on the Natural Questions dataset from the Closed Book Question Answering (CBQA) dataset. The collapse of previously learned world knowledge stored in pre-trained models could be related to this notable performance loss. There are two phases involved in testing this proposition. First, the CBQA dataset draws conclusions from the global information contained in the models. Second, large-scale tuning can significantly alter model parameters, erasing global information (i.e., knowledge forgetting), which is responsible for the notable performance decline on the CBQA dataset. There is a conflict in vanilla-supervised fine-tuning between preserving LLM world information and improving performance on subsequent tasks at the same time.
The best course of action is to designate a certain area of the model to store global information, much like the hippocampus of the human brain, which is specialized for remembering. However, the direct and adjustment way with a single plugin is comparable. An architecture known as “Mix of Experts” (MoE) includes multiple experts and data with different properties are sent to the appropriate experts for customized processing. Using this concept, a group of researchers from Fudan University and Hikvision Inc. aim to offer numerous plugins as experts, allowing one party to access the backup and another to perform subsequent operations.
Their new study presents LoRAMoE, which can improve LLMs' downstream task-solving capabilities and mitigate global knowledge forgetting. A complementary version of MoE is called LoRAMoE. Introducing numerous parallel plugins that are specialists in each feedback layer and coupling them to routers modifies the architecture of the model. Next, they suggest creating separate groups of experts for each LoRAMoE layer using localized balancing constraints. To be more precise, one group works on subsequent tasks and the other is tasked with reducing knowledge forgetting by aligning human instructions with the world information included in the main model. Furthermore, the localized balancing constraint prohibits routers from giving too much importance to only a few experts within the same expert group by balancing the relevance of all experts within the same expert group. It allows multiple professionals to work together, improving the ability to complete jobs later.
The experiment results demonstrate that LoRAMoE can successfully prevent large-scale fine-tuning from altering the world information included in language models. Additionally, by visualizing expert weights for tasks, the team validated the effectiveness of LoRAMoE in localizing capabilities to an interpretable level. The findings indicate that the router prioritizes the production of experts who specialize in completing global knowledge benchmarks. On the other hand, the router concentrates on specialists from another group for other downstream tasks. LoRAMoE successfully resolves the dispute by encouraging expert cooperation. Furthermore, the experiment results indicate that the proposed strategy improves learning in several subsequent tasks, suggesting the potential of the method for multitask learning.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to join. our SubReddit of more than 35,000 ml, 41k+ Facebook community, Discord channel, LinkedIn Grabove, Twitterand Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you'll love our newsletter.
Dhanshree Shenwai is a Computer Science Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking with a keen interest in ai applications. He is excited to explore new technologies and advancements in today's evolving world that makes life easier for everyone.
<!– ai CONTENT END 2 –>