Recent language models such as GPT-3+ have shown notable performance improvements by simply predicting the next word in a sequence, using larger training data sets and greater model capacity. A key feature of these transformer-based models is in-context learning, which allows the model to learn tasks by conditioning on a series of examples without explicit training. However, the working mechanism of learning in context is still partially understood. Researchers have explored the factors that affect learning in context, where it was found that precise examples are not always necessary to be effective, while the structure of the prompts, the size of the model and the order of the examples significantly impact the results. .
This article explores three existing in-context learning methods on transformers and large language models (LLM) by performing a series of binary classification tasks (BCT) under different conditions. The first method focuses on the theoretical understanding of learning in context, with the aim of linking it with gradient descent (GD). The second method is practical understanding, which analyzes how learning in context works in LLMs, considering factors such as label spacing, input text layout, and the overall format of the sequence. The last method is learning to learn in context. To enable in-context learning, MetaICL is used, which is a meta-training framework to fine-tune pre-trained LLMs on a large and diverse collection of tasks.
Researchers from the Department of Computer Science at the University of California, Los Angeles (UCLA) have introduced a new perspective by considering in-context learning in LLMs as a unique machine learning algorithm. This conceptual framework allows traditional machine learning tools to analyze decision boundaries in binary classification tasks. Many invaluable insights for in-context learning performance and behavior are gained by visualizing these decision boundaries in linear and nonlinear environments. This approach explores the generalization capabilities of LLMs, providing a different perspective on the robustness of their performance in learning in context.
The experiments conducted by researchers mainly focused on solving these questions:
- How do pre-trained LLMs perform in BCT?
- How do different factors influence the decision limits of these models?
- How can we improve the fluidity of decision boundaries?
The LLMs' decision boundary for classification tasks was explored by asking them for n examples in BCT context, with an equal number of examples for each class. Using scikit-learn, three types of data sets were created to represent different shapes of decision boundaries, such as linear, circular, and moon-shaped. Additionally, several LLMs were explored, ranging from parameters 1.3B to 13B, including open source models such as Llama2-7B, Llama3-8B, Llama2-13B, Mistral-7B-v0.1 and sheared-Llama-1.3B, to understand your decision limits.
Experimental results showed that adjusting LLMs on in-context examples does not result in smoother decision boundaries. For example, when Llama3-8B was tuned on 128 in-context learning examples, the resulting decision boundaries were not smooth. So, to improve the smoothness of LLM decision boundaries on a data set of classification tasks, a pre-trained Llama model was tuned on a set of 1000 binary classification tasks generated from scikit-learn, featuring decision boundaries that were linear and circular. , or moon-shaped, with equal probabilities.
In conclusion, the research team has proposed a novel method to understand in-context learning in LLMs by examining their decision boundaries in context learning on BCTs. Despite obtaining high testing accuracy, it was found that the decision boundaries of LLMs are often not smooth. Then, the factors affecting this decision boundary were identified through experiments. In addition, adaptive sampling and fine-tuning methods were also explored, which were found to be effective in improving the smoothness of the boundaries. In the future, these findings will provide new insights into the mechanics of in-context learning and suggest avenues for research and optimization.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter.
Join our Telegram channel and LinkedIn Grabove.
If you like our work, you will love our Newsletter..
Don't forget to join our SubReddit of over 45,000 ml
Create, edit, and augment tabular data with the first composite ai system, Gretel Navigator, now generally available! (Commercial)
Sajjad Ansari is a final year student of IIT Kharagpur. As a technology enthusiast, he delves into the practical applications of ai with a focus on understanding the impact of ai technologies and their real-world implications. His goal is to articulate complex ai concepts in a clear and accessible way.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>