Large language models, like those that power popular ai chatbots like ChatGPT, are incredibly complex. Although these models are used as tools in many areas, such as customer support, code generation, and language translation, scientists still do not fully understand how they work.
In an effort to better understand what's going on under the hood, researchers at MIT and elsewhere studied the mechanisms at work when these huge machine learning models retrieve stored knowledge.
They found a surprising result: Large Language Models (LLM) often use a very simple linear function to retrieve and decode stored data. Furthermore, the model uses the same decoding function for similar types of facts. Linear functions, equations with only two variables and no exponents, capture the direct, straightforward relationship between two variables.
The researchers showed that by identifying linear functions for different facts, they can test the model to see what it knows about new topics and where in the model that knowledge is stored.
Using a technique they developed to estimate these simple functions, the researchers found that even when a model answers a question incorrectly, it has often stored the correct information. In the future, scientists could use this approach to find and correct falsehoods within the model, which could reduce the tendency of a model to sometimes give incorrect or nonsensical answers.
“Although these models are really complicated nonlinear functions that are trained with a lot of data and are very difficult to understand, sometimes there are really simple mechanisms at work within them. This is an example of that,” says Evan Hernández, a graduate student in electrical engineering and computer science (EECS) and co-lead author of a paper. document detailing these findings.
Hernandez wrote the paper with co-lead author Arnab Sharma, a computer science graduate student at Northeastern University; his advisor, Jacob Andreas, associate professor at EECS and member of the Computer Science and artificial intelligence Laboratory (CSAIL); lead author David Bau, assistant professor of computer science at Northeastern; and others at MIT, Harvard University, and the Israel Institute of technology. The research will be presented at the International Conference on Learning Representations.
Find facts
Most large language models, also called transformative models, are neural networks. Loosely based on the human brain, neural networks contain billions of interconnected nodes or neurons that are grouped into many layers and that encode and process data.
Much of the knowledge stored in a transformer can be represented as relationships connecting subjects and objects. For example, “Miles Davis plays the trumpet” is a relationship that connects the subject, Miles Davis, to the object, the trumpet.
As a transformer gains more knowledge, it stores additional data on a given topic in multiple layers. If a user asks about that topic, the model must decode the most relevant fact to answer the query.
If someone alerts a transformer saying “Miles Davis plays the. . .” the model should respond with “trumpet” and not “Illinois” (the state where Miles Davis was born).
“Somewhere in the network computation, there has to be a mechanism that looks for the fact that Miles Davis plays the trumpet, and then extracts that information and helps generate the next word. We wanted to understand what that mechanism was,” says Hernández.
The researchers set up a series of experiments to test LLMs and found that, although extremely complex, the models decode relational information using a simple linear function. Each function is specific to the type of fact being retrieved.
For example, the transformer would use one decoding function each time it wants to generate the instrument a person plays and a different function each time it wants to generate the state in which a person was born.
The researchers developed a method to estimate these simple functions and then calculated functions for 47 different relationships, such as “capital of a country” and “lead singer of a band.”
While there could be an infinite number of possible relationships, the researchers chose to study this specific subset because they are representative of the types of facts that can be written in this way.
They tested each function by changing the theme to see if they could recover the correct information from the object. For example, the function for “capital of a country” should retrieve Oslo if the subject is Norway and London if the subject is England.
Functions recovered the correct information more than 60 percent of the time, demonstrating that some of the information in a transformer is encoded and retrieved in this way.
“But not everything is linearly coded. For some facts, although the model knows them and will predict text that is consistent with these facts, we cannot find linear functions for them. This suggests that the model is doing something more complex to store that information,” he says.
Visualizing the knowledge of a model
They also used features to determine what a model believes to be true about different topics.
In one experiment, they started with the message “Bill Bradley was” and used the decoding features for “plays sports” and “attended college” to see if the model knew that Senator Bradley was a basketball player who attended Princeton. .
“We can show that although the model may choose to focus on different information when producing text, it encodes all of that information,” Hernandez says.
They used this probing technique to produce what they call an “attribute lens,” a grid that visualizes where specific information about a particular relationship is stored within the many layers of the transformer.
Attribute lenses can be generated automatically, providing a simplified method to help researchers understand more about a model. This visualization tool could allow scientists and engineers to correct stored knowledge and help prevent an ai chatbot from providing false information.
In the future, Hernández and his collaborators want to better understand what happens in cases where facts are not stored linearly. They would also like to conduct experiments with larger models, as well as study the accuracy of linear decoding functions.
“This is exciting work that reveals a missing piece in our understanding of how large linguistic models remember factual knowledge during inference. Previous work has shown that LLMs construct information-rich representations of given topics, from which specific attributes are extracted during inference. “This work shows that the complex nonlinear LLM calculation for attribute extraction can be well approximated by a simple linear function,” says Mor Geva Pipek, assistant professor at the Faculty of Computer Science at Tel Aviv University, who did not participate in this study. work.
This research was supported, in part, by Open Philanthropy, the Israel Science Foundation, and an Azrieli Foundation Early Career Faculty Grant.