In autoregressive transformer language models, a neural mechanism is identified that represents an input-output function as a compact vector known as a function vector (FV). Causal mediation analysis is applied to various in-context learning tasks, revealing that a small number of attention heads carry FVs, which remain robust across diverse contexts, enabling task execution in natural and text environments. zero shot. FVs contain information about the output space of functions and can be combined to trigger new complex tasks, indicating the presence of internal abstractions for general-purpose functions in LLMs.
Northeastern University researchers expand the study of in-context learning (ICL) in LLMs and delve deeper into transformers to discover the existence of FVs. It references numerous related studies, including those on ICL prompt forms, meta-learning models, and Bayesian task inference, while drawing insights from research on the decoded vocabulary of transformers. It also leverages analyzes of copying behavior in context and employs causal mediation analysis methods developed by Pearl and others to isolate VFs.
The study investigates the existence of FV in large autoregressive transformative language models trained with large natural text data. Expands the concept of ICL and explores the underlying mechanisms in transformers that give rise to PVs. Previous research on ICL, including rapid forms and escalation, informs this study. FVs are presented as compact vector representations for input and output tasks. Causal mediation analysis identifies VFs and understands their characteristics, including robustness to context changes and potential for semantic composition.
The method employs causal mediation analysis to explore FV in autoregressive transformative language models. It performs tests to evaluate whether hidden states encode tasks and evaluates the portability of natural text by measuring the accuracy in generating results. More than 40 jobs are created to test FV extraction in various environments, focusing on six representative tasks. The article references previous research on ICL and feature representations in language models.
The current research identifies FV in autoregressive transformative language models using causal mediation analysis. FVs serve as compact task representations that are robust in context and can trigger specific procedures in diverse environments. It demonstrates strong causal effects at intermediate layers and is amenable to semantic vector composition for complex tasks. The approach outperforms alternative methods and emphasizes that LLMs possess versatile internal function abstractions applicable in all contexts.
The proposed approach successfully identifies the presence of FV within autoregressive transformative language models through causal mediation analysis. These compact representations of input and output tasks demonstrate robustness across different contexts and exhibit strong causal effects in the middle layers of language models. Although FVs usually contain information that encodes the output space of the function, their reconstruction is more complex. Furthermore, FVs can be combined to trigger new complex tasks, showing potential for semantic vector composition. The findings suggest the existence of internal abstractions of general-purpose functions in various contexts.
Future research directions include delving deeper into the internal structure of FVs to discern their encoded information and execution contributions, their usefulness in complex tasks, and their potential for composability. It is important to explore the generalization of FVs across various models, tasks, and layers. Comparative studies with other VF construction methods and research on their relationships with task representation techniques are needed. Furthermore, the application of FV in natural language processing tasks, such as text generation and question answering, deserves further exploration.
Review the Paper, GitHuband Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 32k+ ML SubReddit, Facebook community of more than 40,000 people, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
we are also in Telegram and WhatsApp.
Hello, my name is Adnan Hassan. I’m a consulting intern at Marktechpost and soon to be a management trainee at American Express. I am currently pursuing a double degree from the Indian Institute of technology, Kharagpur. I am passionate about technology and I want to create new products that make a difference.
<!– ai CONTENT END 2 –>