Natural language processing (NLP) has undergone a paradigm shift in recent years, with the advent of large language models (LLMs) that outperform previously relatively small language models (LMs), such as GPT-2. and T5 Raffel et al. in a variety of NLP tasks. Request is the de facto method of using LLMs to perform various tasks by using natural language instructions in context to direct LLMs to produce the desired results without parameter updates, in contrast to the conventional fine-tuning paradigm where the GL parameters can be maintained for each. downstream task.
While this prompting scheme has allowed LLMs to perform quite well on various tasks in a zero or few attempts environment, their performance on some specific downstream tasks still needs improvement and requires further refinement, especially when training data is available. available. However, because most LLMs only offer black-box inference APIs and are expensive to tune, most users and academics cannot optimize these LLMs directly. Therefore, a difficult issue that needs to be resolved is how to effectively improve the performance of LLMs in certain downstream tasks, sometimes with limited training instances. A new study from the University of California, Santa Barbara, and Microsoft proposes directional stimulus cue (DSP) architecture that improves the frozen black-box LLM in subsequent tasks using a small tunable LM (RL).
To be more precise, for each input text, a small LM (called a policy LM) learns to provide a series of discrete tokens as a directed stimulus, which might offer some information or instruction about the input sample instead of a hint. generic for work To steer the creation of the LLM toward the desired goal, such as higher performance measurement scores, the created stimulus is combined with the original input and fed to the LLM. They initially use Supervised Fine Tuning (SFT) with a pretrained LM using a small number of collected training samples. Training is aimed at maximizing reward, defined as the scores on subsequent performance measures of the LLM generation that depend on the stimulus produced by the LM policy. After further optimization to explore better stimuli, the refined LM initializes the policy LM to RL.
Figure 1 shows an example of the summary job. To help the LLM produce the required abstract based on the keywords, the keywords act as stimuli (cues). The policy LM can be optimized by using scores from evaluation metrics like ROUGE as an incentive, which incentivizes you to provide keywords that direct the LLM to produce better summaries. While LLMs have excellent generation skills, they frequently exhibit undesirable behaviors, requiring detailed guidance on the intended generation feature and direction for certain downstream tasks. This is the basis of his proposed approach. The tiny policy LM can produce a series of tokens as a targeted stimulus to provide the LLM with detailed guidance in sample form towards the intended target, but cannot produce text resembling human speech.
RL offers a natural solution to bridge the gap between the optimized object (eg, the small LM policy that generates stimulus) and the optimization goal defined by the LLM generation, unlike previous studies that find optimal indications via quick engineering/optimization, which is trying to explain the “question” more clearly. His approach tries to provide “clues” or “hints” for each “question”. It also differs from the chain of thought cue which encourages the LLM to generate intermediate reasoning steps when solving reasoning tasks. His approach uses a small tunable model to control and guide the LLM and targets generation tasks where there is not just one correct “answer”. They test their framework on summary generation tasks and dialogue responses.
The small policy LM that creates pacing, for example, is an optimized object, but the output of the LLM determines the optimization goal. RL provides a simple way to bridge this gap. Unlike previous research, this one tries to clarify the “question” through the use of rapid engineering or optimization. His strategy strives to offer “clues” or “hints” for each “question”. In addition, it differs from thought chain prompts, which encourage the mind to produce intermediate steps of reasoning on its own while completing tasks that require logic. His method focuses on generating jobs with more than one valid “answer” and employs a simple adjustable model to regulate and direct the LLM. For tasks that require the development of discussion responses and summaries, they evaluate your framework. They test using the 750M Flan-T5-large to establish the LM policy and the 175B Codex as the LLM. According to the test results, when Codex relies on the prompts produced by the modded T5, its performance in subsequent tasks increases significantly. The keywords that the abstract must contain are used as directing stimuli for the summary work. Codex performance may already have been improved by 7.2% with T5, which was trained on 2,000 samples from the CNN/Daily Mail dataset.
To develop conversation acts that specify the intended meaning behind the target responses for 500 dialogues from the MultiWOZ dataset, they train the LM policy. Codex performance increased by 52.5% in total scores thanks to dialogue actions produced by the LM policy. It performs the same or better than previous systems trained with full training data (8438 dialogs).
review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 14k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.