Without changing model parameters, large language models have in-context learning abilities that allow them to complete a given job only a small number of instances. A model can be used for multiple tasks due to its task agnostic nature. By contrast, conventional task adaptation techniques, including fine tuning, modify the model parameters for each task. However, in-context and task-independent learning is rarely the practitioner’s method of choice, as it routinely performs worse than task-specific adaptive techniques. Most of the previous studies blame this disparity in performance on the restricted context window of LLMs, which can only accommodate a small number of task instances.
However, they demonstrate that the gap between learning in context and adjustment techniques remains even when examples of identical tasks are given. This finding raises the question of whether the performance difference is a general limitation of task-independent adaptive strategies or whether it is unique to learning in context. Can you specifically create adaptation strategies that meet the requirements listed below?
• Task Agnostic: The same model applies universally to multiple activities.
• Quality – Through these various tasks, you achieve precision competitive with task-specific approaches.
• Scalable data: Learning efficiency increases as the number of task instances increases. They begin by analyzing the causes of the quality discrepancy.
They divide an LLM’s ability to learn in context into two components: the acquisition of effective task representations and the performance of probabilistic inference, or reasoning, on these representations. Is the gap due to a lack of information in the representations or the inability of the LLMs to analyze them? By assessing reasoning and representation gaps in a variety of LLM families across various binary classification tasks, they test this notion empirically. They conclude that LLMs have strong representations and that most of the quality disparity is due to weak reasoning on their part.
They also find that fine tuning improves the basic model on both axes, but predominantly improves task-specific reasoning, responsible for 72% of the performance improvement. Surprisingly, most of the methods to reduce the achievement gap, such as rapid engineering and active selection of examples, only focus on the learned representations of the LLM. Rather, their research examines an alternative strategy for improving LLM reasoning skills. They refine the LLMs using artificially created probabilistic inference challenges as a first step in improving their reasoning skills. While this method improves the model’s reference context learning performance, it also requires individual fine tuning of each LLM.
They go a step further and speculate on the possibility of developing reasoning skills in a way that is independent of tasks and models. They show that a completely agnostic approach can be taken to improve reasoning skills. Researchers from Standford University and Cornell University in this study suggest Tart, which uses a synthetically taught reasoning module to enhance the reasoning abilities of an LLM. Tart only uses synthetically produced logistic regression problems, regardless of post-task or base LLM, to train a Transformer-based reasoning module. Without further training, this inference module can be built using the embeddings from an LLM to enhance your deductive capabilities.
In particular, Tart achieves the necessary goals:
• Task Neutral: Tart’s inference module must be trained once on dummy data.
• Quality – Performs better than the basic LLM across the board and closes the gap using task-specific fine-tuning techniques.
• Scalable data: handling 10 times more instances than learning in context.
Tart is task, model, and domain independent. They show that Tart generalizes across three families of models across more than 14 NLP classification tasks and even across domains, using a single inference module trained on synthetic data. They show that Tart performs better in terms of quality than in-context learning by 18.4%, task-specific adapters by 3.4%, and whole-task-specific fine-tuning by 3.1% across multiple NLP tasks.
In the RAFT Benchmark, Tart pushes the performance of GPT-Neo to the point where it matches GPT-3 and Bloom, while outperforming the latter by 4%. Tart overcomes the inconveniently short context lifetime barrier of in-context learning and is scalable to data. In an LLM, each instance can occupy multiple tokens, often hundreds, whereas Tart’s reasoning module only uses two tokens per instance: one for the context and one for the label. The benefits that can result from this data scalability can reach 6.8%. Theoretically, they demonstrate that Tart’s generalization abilities depend primarily on the distribution shift between the synthetic data distribution and the natural text embedding distribution, as assessed by the Wasserstein-1 metric.
The following is a summary of his main contributions:
• Using a representation-reasoning decomposition, investigate why task-specific fine-tuning outperforms learning in context while having access to the same information.
• Introducing Tart, a new task-agnostic approach that goes beyond task-specific approaches and does not require real data for training.
• Demonstrate that Tart is effective for various model families in NLP tasks. The same inference module is also applied to the visual and speech domains.
review the Paper and github link. Don’t forget to join our 24k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at asif@marktechpost.com
Check out 100 AI tools at AI Tools Club
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.