To make our way in the world, our brain must develop an intuitive understanding of the physical world around us, which we then use to interpret the sensory information coming to the brain.
How does the brain develop that intuitive understanding? Many scientists believe it can use a process similar to what is known as “self-supervised learning.” This type of machine learning, originally developed as a way to create more efficient models for computer vision, allows computational models to learn about visual scenes based solely on the similarities and differences between them, without labels or other information.
A pair of studies by researchers at MIT’s K. Lisa Yang Center for Integrative Computational Neuroscience (ICoN) offer new evidence supporting this hypothesis. The researchers found that when they trained models known as neural networks using a particular type of self-supervised learning, the resulting models generated patterns of activity very similar to those observed in the brains of animals performing the same tasks as the models.
The findings suggest that these models are capable of learning representations of the physical world that they can use to make accurate predictions about what will happen in that world, and that the mammalian brain may be using the same strategy, the researchers say.
“The theme of our work is that ai designed to help build better robots ends up also being a framework for better understanding the brain in general,” says Aran Nayebi, a postdoc at the ICoN Center. “We can’t yet say whether it’s whole-brain, but across disparate scales and brain areas, our results seem to suggest an organizing principle.”
Nayebi is the lead author of one of the studies, co-authored with Rishi Rajalingham, a former MIT postdoc now at Meta Reality Labs, and lead authors Mehrdad Jazayeri, an associate professor of brain and cognitive sciences and a member of the McGovern Research Institute at brain; and Robert Yang, assistant professor of brain and cognitive sciences and associate member of the McGovern Institute. Ila Fiete, director of the ICoN Center, professor of brain and cognitive sciences and associate member of the McGovern Institute, is the lead author of the other study, which was co-led by Mikail Khona, an MIT graduate student, and Rylan Schaeffer, a former research associate senior at MIT.
Both studies will be presented at the 2023 Conference on Neural Information Processing Systems (NeurIPS) in December.
Modeling the physical world
Early computer vision models were primarily based on supervised learning. With this approach, models are trained to classify images, each of which is labeled with a name: cat, car, etc. The resulting models perform well, but this type of training requires a large amount of human-labeled data.
To create a more efficient alternative, in recent years researchers have turned to models built using a technique known as contrastive self-supervised learning. This type of learning allows an algorithm to learn to classify objects based on their similarity to each other, without providing external labels.
“This is a very powerful method because you can now take advantage of very large modern data sets, especially videos, and really unlock their potential,” says Nayebi. “Much of the modern ai you see now, especially in recent years with ChatGPT and GPT-4, is the result of training a self-supervised objective function on a large-scale data set to obtain a very flexible representation.”
These types of models, also called neural networks, consist of thousands or millions of processing units connected to each other. Each node has connections of different intensity with other nodes in the network. As the network analyzes enormous amounts of data, the strengths of those connections change as the network learns to perform the desired task.
As the model performs a particular task, the activity patterns of different units within the network can be measured. The activity of each unit can be represented as a firing pattern, similar to the firing patterns of neurons in the brain. Previous work by Nayebi and others has shown that self-monitored vision models generate activity similar to that observed in the visual processing system of mammalian brains.
In the two new NeurIPS studies, the researchers set out to explore whether self-supervised computational models of other cognitive functions could also show similarities to the mammalian brain. In the study led by Nayebi, researchers trained self-supervised models to predict the future state of their environment through hundreds of thousands of naturalistic videos depicting everyday scenarios.
“Over the last decade, the dominant method for building neural network models in cognitive neuroscience is to train these networks on individual cognitive tasks. But models trained this way rarely generalize to other tasks,” says Yang. “Here we test whether we can build models for some aspect of cognition by first training on naturalistic data using self-supervised learning and then evaluating in laboratory settings.”
Once the model was trained, the researchers generalized it to a task they called “Mental-Pong.” This is similar to the video game Pong, where a player moves a paddle to hit a ball that travels across the screen. In the Mental-Pong version, the ball disappears shortly before hitting the paddle, so the player has to estimate its trajectory in order to hit the ball.
The researchers found that the model was able to follow the trajectory of the hidden ball with a precision similar to that of neurons in the mammalian brain, which in a previous study by Rajalingham and Jazayeri had been shown to simulate its trajectory, a cognitive phenomenon. known as “mental”. simulation.” Additionally, the neural activation patterns observed in the model were similar to those observed in the animals’ brains while they played, specifically, in a part of the brain called the dorsomedial frontal cortex. The researchers say that no other kind of computational model has been able to match biological data as closely as this.
“There are a lot of efforts in the machine learning community to create artificial intelligence,” says Jazayeri. “The relevance of these models to neurobiology depends on their ability to additionally capture the inner workings of the brain. “The fact that Aran’s model predicts neural data is really important, as it suggests that we may be getting closer to building artificial systems that emulate natural intelligence.”
Navigating the world
The study led by Khona, Schaeffer and Fiete focused on a type of specialized neurons known as grid cells. These cells, located in the entorhinal cortex, help animals navigate, working together with place cells located in the hippocampus.
While place cells are activated whenever an animal is in a specific location, grid cells are activated only when the animal is at one of the vertices of a triangular network. Groups of grid cells create overlapping networks of different sizes, allowing them to encode a large number of positions using a relatively small number of cells.
In recent studies, researchers have trained supervised neural networks to mimic the function of network cells by predicting an animal’s next location based on its starting point and its speed, a task known as trajectory integration. However, these models depended on access to privileged information about absolute space at all times, information that the animal does not have.
Inspired by the surprising coding properties of multiperiodic grid cell code for space, the MIT team trained a self-supervised contrastive model to perform this same path integration task and represent space efficiently while doing so. For the training data, they used sequences of velocity inputs. The model learned to distinguish positions based on whether they were similar or different: close positions generated similar codes, but positions further away generated more different codes.
“It’s similar to training models with images, where if two images are cat heads, their codes should be similar, but if one is a cat head and the other is a truck, then you want their codes to repel each other,” Khona says. “We’re taking the same idea but applying it to space trajectories.”
Once the model was trained, the researchers discovered that the activation patterns of the nodes within the model formed several reticular patterns with different periods, very similar to those formed by the reticular cells of the brain.
“What excites me about this work is that it draws connections between mathematical work on the surprising information-theoretic properties of the grid cell code and the computation of trajectory integration,” Fiete says. “While the mathematical work was analytical, what properties does the grid cell code possess? — the approach of optimizing coding efficiency through self-supervised learning and obtaining grid-like tuning is synthetic: it shows what properties might be necessary and sufficient to explain why the brain has grid cells.”
The research was funded by the K. Lisa Yang ICoN Center, the National Institutes of Health, the Simons Foundation, the McKnight Foundation, the McGovern Institute, and the Helen Hay Whitney Foundation.