Language models (LMs) have gained significant attention in recent years due to their remarkable capabilities. When training these models, neural sequence models are first pre-trained on a large, minimally curated web text, and then fine-tuned using specific examples and human feedback. However, these models often possess undesirable skills or knowledge creators that they wish to remove before deployment. The challenge lies in effectively “unlearning” or forgetting specific potential without losing overall model performance. While recent research has focused on developing techniques to remove specific skills and knowledge from LMs, there has been limited evaluation of how this forgetting generalizes to other inputs.
Existing attempts to address the challenge of machine “unlearning” have evolved from earlier methods focused on removing unwanted data from training sets to more advanced techniques. These include optimization-based techniques, model editing by estimating parameter importance, and gradient ascent on unwanted responses. Some methods include frameworks for comparing unlearned networks with fully retrained networks, while other methods are specific to large language models (LLMs), such as misinformation warnings or manipulation of model representations. However, most of these approaches have limitations in feasibility, generalizability, or applicability to complex models such as LLMs.
MIT researchers have proposed a new approach to study generalization behavior in forgetting skills within LMs. This method involves fine-tuning models on randomly labeled data for target tasks, a simple yet effective technique to induce forgetting. Experiments are conducted to characterize the generalization of forgetting and uncover several key findings. The approach highlights the nature of forgetting in LMs and the complexities of effectively removing undesired potential from these systems. This research shows complex patterns of cross-task variability in forgetting and the need for further studies on how training data used for forgetting affects model predictions in other areas.
A comprehensive assessment framework is used that utilizes 21 multiple-choice tasks across multiple domains such as common-sense reasoning, reading comprehension, mathematics, toxicity, and language comprehension. These tasks are selected to cover a broad area of abilities while maintaining a consistent multiple-choice format. The assessment process follows Language Model Evaluation Harness (LMEH) standards for zero-shot assessment, using predetermined prompts and assessing the probabilities of the options. Tasks are binarized, and steps are taken to clean the datasets by removing overlaps between training and test data and limiting sample sizes to maintain consistency. The experiments primarily use the Llama2 7-B parameter base model, which provides a solid foundation for analyzing forgetting behavior.
The results demonstrate diverse forgetting behaviors in different tasks. After fine-tuning, the test accuracy increases, although it might decrease slightly since the validation set is not identical to the test set. The forgetting phase produces three distinct categories of behavior:
- Forget-to-forget precision is very similar to fine-tuned precision.
- Forget that the accuracy decreases, but it is still above the pre-trained accuracy.
- Forget about accuracy dropping below pre-trained accuracy and possibly back to 50%.
These results highlight the complex nature of forgetting in LMs and the task-dependent nature of forgetting generalization.
In conclusion, MIT researchers have presented an approach to studying generalization behavior in forgetting skills within LMs. This paper highlights the effectiveness of tuning LMs to random responses to induce forgetting of specific capabilities. Assessment tasks determine the degree of forgetting, and factors such as dataset difficulty and model confidence do not predict how well forgetting occurs. However, the total variance in the hidden states of the model does correlate with forgetting success. Future research should aim to understand why certain examples are forgotten within tasks and explore the mechanisms behind the forgetting process.
Take a look at the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and LinkedInJoin our Telegram Channel.
If you like our work, you will love our fact sheet..
Don't forget to join our SubReddit of over 50,000 ml
Sajjad Ansari is a final year student from IIT Kharagpur. As a technology enthusiast, he delves into practical applications of ai, focusing on understanding the impact of ai technologies and their real-world implications. He aims to articulate complex ai concepts in a clear and accessible manner.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>