If you ask a large language model (LLM) like GPT-4 to smell a rain-soaked campsite, it will politely decline. If you ask the same system to describe that scent to you, it will wax poetic about “an air heavy with anticipation” and “a scent that is fresh and earthy,” despite having no prior experience with rain or a nose to help it make such observations. One possible explanation for this phenomenon is that the LLM is simply mimicking the text present in its vast training data, rather than working with any actual knowledge of rain or smell.
But does the lack of eyes mean that language models can never “understand” that a lion is “bigger” than a house cat? Philosophers and scientists alike have long considered the ability to assign meaning to language to be a hallmark of human intelligence, and have wondered what essential ingredients enable us to do so.
In analyzing this puzzle, researchers at MIT’s Computer Science and artificial intelligence Laboratory (CSAIL) have discovered intriguing results that suggest language models can develop their own understanding of reality as a way to enhance their generative capabilities. The team first developed a set of small Karel puzzles, which involved coming up with instructions for controlling a robot in a simulated environment. They then trained an LLM on the solutions, but without demonstrating how the solutions actually worked. Finally, using a machine learning technique called “probing,” they peered inside the model’s “thought process” as it generates new solutions.
After training on more than a million random puzzles, they found that the model spontaneously developed its own conception of the underlying simulation, despite never having been exposed to this reality during training. These findings call into question our intuitions about what types of information are necessary to learn linguistic meaning, and whether LLMs will one day be able to understand language at a deeper level than they do today.
“At the beginning of these experiments, the language model generated random instructions that did not work. When we completed training, our language model generated correct instructions at a rate of 92.4 percent,” says Charles Jin, a PhD student in electrical engineering and computer science (EECS) at MIT and a CSAIL affiliate who is the lead author of a paper. New article on work“It was a very exciting moment for us because we thought that if the language model could complete a task with that level of accuracy, we could expect it to also understand meanings within language. This gave us a starting point to explore whether LLMs do in fact understand text, and we now see that they are capable of much more than just blindly matching words.”
Inside the mind of an LLM
The probe helped Jin witness this progress firsthand. Its role was to interpret what the LLM thought the instructions meant, revealing that the LLM developed its own internal simulation of how the robot moves in response to each instruction. As the model’s ability to solve puzzles improved, these conceptions also became more accurate, indicating that the LLM was beginning to understand the instructions. Before long, the model was correctly putting the pieces together to form working instructions.
Jin notes that the LLM's understanding of language develops in phases, much like how a child learns speech in several steps. At first, it's like a baby's babbling: repetitive and mostly unintelligible. Then the language model acquires syntax, or the rules of language. This allows it to generate instructions that might seem like genuine solutions, but still don't work.
However, the LLM's instructions gradually improve. Once the model acquires meaning, it begins to generate instructions that correctly implement the requested specifications, like a child forming coherent sentences.
Separating the method from the model: A “bizarre world”
The probe was only intended to “get inside the brain of an LLM,” as Jin describes it, but there was a slim chance it might also perform some of the model’s thinking process. The researchers wanted to make sure their model understood instructions independently of the probe, rather than the probe inferring the robot’s movements from its understanding of the LLM’s syntax.
“Imagine you have a pile of data that encodes the LM’s thought process,” Jin suggests. “The probe is like a forensic analyst: you hand this pile of data to the analyst and say, ‘This is how the robot moves, now try to find the robot’s movements in the pile of data. ’ The analyst tells you later that he knows what’s going on with the robot in the pile of data. But what if the pile of data actually just encodes the raw instructions, and the analyst has figured out a clever way to extract the instructions and follow them accordingly? Then the language model hasn’t actually learned what the instructions mean at all.”
To unravel its functions, researchers reversed the meaning of a new probe’s instructions. In this “bizarre world,” as Jin calls it, the instructions that moved the robot around the grid now meant “down.”
“If the probe translates instructions into robot positions, it should be able to translate the instructions into the extraneous meanings just as accurately,” Jin says. “But if the probe finds encodings of the robot’s original movements in the language model’s thought process, then it should have a hard time extracting the robot’s extraneous movements from the original thought process.”
It turned out that the new probe experienced translation errors and was unable to interpret a language model that had different meanings from the instructions. This meant that the original semantics were built into the language model, indicating that the LLM understood what instructions were needed independently of the original probe classifier.
“This research directly addresses a central question of modern artificial intelligence: Are the astonishing capabilities of large language models simply due to statistical correlations at scale, or do large language models develop a meaningful understanding of the reality they are asked to work with? This research indicates that the LLM develops an internal model of the simulated reality, even though it has never been trained to develop such a model,” said Martin Rinard, MIT professor in EECS, member of CSAIL, and senior author of the paper.
This experiment further supported the team's analysis that language models can develop a deeper understanding of language. Still, Jin acknowledges some limitations in their paper: they used a very simple programming language and a relatively small model to derive their insights. Upcoming workswill look to use a more general setup. While Jin's latest research doesn't describe how to make the language model learn meaning faster, he believes future work can leverage these insights to improve the way language models are trained.
“An interesting question that remains to be answered is whether the LLM is actually using its internal model of reality to reason about that reality while solving the robot navigation problem,” Rinard says. “While our results are consistent with the LLM using the model in this way, our experiments are not designed to answer this next question.”
“There is currently a lot of debate about whether LLMs are actually ‘understanding’ language, or rather whether their success can be attributed to what are essentially tricks and heuristics that arise from reading large volumes of text,” says Ellie Pavlick, an assistant professor of computer science and linguistics at Brown University who was not involved in the paper. “These questions are at the heart of how we build ai and what we expect the inherent possibilities or limitations of our technology to be. This is a good paper that looks at this question in a controlled way: the authors take advantage of the fact that computer code, like natural language, has syntax and semantics, but unlike natural language, semantics can be directly observed and manipulated for experimental purposes. The experimental design is elegant and their findings are optimistic, suggesting that perhaps LLMs can learn something deeper about what language ‘means’.”
Jin and Rinard's paper was funded in part by grants from the U.S. Defense Advanced Research Projects Agency (DARPA).