We ask a lot of ourselves when we are babies. Somehow we must go from sensory masses to mobile, rational and attentive communicators in just a few years. Here you are, a baby with no vocabulary, in a room full of toys and stuffed animals. You pick up a Lincoln log and your caretaker says, “This is a 'log.'” Over time, you come to understand that “trunk” does not strictly refer to this particular brown plastic cylinder nor to brown plastic cylinders in general, but rather to brown plastic cylinders that embody the characteristics of felled and stripped tree parts. , which are also, of course, “logs”.
There has been much research and heated debate about how babies achieve this. Some scientists have argued that most of our language acquisition can be explained by associative learning, since we associate sounds with sensitivity, in the same way that dogs associate the sound of a bell with food. Others claim that there are features built into the human mind that have shaped the forms of all languages and are crucial to our learning. Still others maintain that young children build your understanding of new words in addition to your understanding of other words.
This discourse moved forward on a recent Sunday morning, as Tammy Kwan and Brenden Lake delivered blackberries from a bowl to the mouth of their twenty-one-month-old daughter Luna. Luna was dressed in pink tights and a pink tutu, with a silicone bib around her neck and a soft pink hat on her head. A lightweight GoPro-type camera was placed on the front.
“Babooga,” he said, pointing at the berries with a round finger. Dr. Kwan gave him the rest and Dr. Lake looked at the empty bowl with amusement. “That's like $10,” he said. A light in the chamber flickered.
For an hour each week for the past 11 months, Dr. Lake, a psychologist at New York University whose research focuses on human and artificial intelligence, has been attaching a camera to Luna and recording things from her point of view while play. His goal is to use the videos to train a language model using the same sensory information that a young child is exposed to: a LunaBot, if you will. In doing so, he hopes to create better tools for understanding both ai and ourselves. “We think this research finally establishes that link between those two areas of study,” Dr. Lake said. “We can finally put them in dialogue with each other.”
There are many obstacles to using ai models to understand the human mind. After all, the two are completely different. Modern language and multimodal models, such as OpenAI's GPT-4 and Google's Gemini, are assembled into neural networks with little built-in structure and have improved primarily as a result of greater computing power and larger training data sets. Google's latest large language model, Llama 3, is trained on over ten trillion words; an average five-year-old child is exposed to more than 300,000.
These models can analyze pixels in images, but they can't taste cheese or berries or feel hunger, important types of learning experiences for children. Researchers can do their best to code a child's entire sensory stream, but inevitably crucial aspects of their phenomenology will be overlooked. “What we're seeing is just the residue of an active student,” said Michael Frank, a Stanford psychologist who for years has been trying to capture the human experience on camera. Currently, his lab works with more than 25 children across the country, including Luna, to record their experiences at home and in social settings.
Humans are not mere data receptacles, as neural networks are, but intentional animals. Everything we see, every object we touch, every word we hear is combined with the beliefs and desires we have at the moment. “There's a deep relationship between what you're trying to learn and the data that's coming in,” said Linda Smith, a psychologist at Indiana University. “These models simply predict. “They take everything they are given and take the next best step.” While it is possible to emulate human intentionality by structuring training data (something Dr. Smith's lab has been trying to do recently), the most proficient ai models, and the companies that make them, have long been geared toward towards efficiently processing more data, not making more sense with less.
Furthermore, there is a more conceptual issue, arising from the fact that the capabilities of ai systems can appear quite human, even though they arise in non-human ways. Recently, dubious claims of technology/2022/06/11/google-ai-lamda-blake-lemoine/” title=”” rel=”noopener noreferrer” target=”_blank”>awareness, general intelligence and tech/ghosts-in-google-gemini-openai-gpt-4-experts-believe-ai-models-more-sentient-than-the-studios-let-on-13716872.html” title=”” rel=”noopener noreferrer” target=”_blank”>sensitivity have emerged from the industrial laboratories of Google and Microsoft after the launch of new models. In March, Claude 3, the newest model from an ai research startup called Anthropic, woke up twitter.com/alexalbert__/status/1764722513014329620″ title=”” rel=”noopener noreferrer” target=”_blank”>debate when, after analyzing a random phrase about pizza ingredients hidden in a long list of unrelated documents, he expressed the suspicion that it was being tested. These reports often smack of marketing strategies rather than objective scientific projects, but they highlight our eagerness to attribute scientific meaning to ai.
But human minds are converging with virtual ones in other ways. Tom Griffiths, a cognitive scientist at Princeton, has suggested that by describing the limitations of human intelligence and building models that have similar limitations, we could end up with a better understanding of ourselves and more interpretable, efficient ai. “Human intelligence helps us better understand and model computers, and we can use these models to understand human intelligence,” Dr. Griffiths said. “This is all very new. “We are exploring the space of possibilities.”
In February, Dr. Lake and his collaborators created the first ai model trained on a child's experiences, using videos captured in Dr. Frank's lab more than a decade ago. The model was published in Science magazine and, from 60 hours of footage, was able to relate different moments with words. She writes “sand” and the model will remember the moment, 11 years ago, when the boy whose experiences the model trained on visited the beach with her mother. She types “car” and the model will display a first-person video of the child sitting in her booster seat.
The training videos are old and grainy, and the data is quite sparse, but the model's ability to form some kind of conceptual mapping of the world suggests that it might be possible for language to be acquired primarily through association. “We had a reviewer of the paper say, 'Before reading this, I would have thought this was impossible,'” said Wai Keen Vong, a researcher at New York University who helped lead the work.
For Dr. Lake, and other researchers like him, these intertwined questions: To what extent can we make ai look like humans? What makes us human? — present the most interesting research on the horizon. Developing the first question piece by piece, modeling social interactions, intentions, and biases, collecting entire video sequences from a front-facing camera mounted on a one-year-old child, is to come close to answering the second.
“If the field can get to the point where models are trained solely on data that a single child saw, and perform well on a large set of tasks, that would be a major scientific achievement,” Dr. Lake said.
At their apartment, Dr. Lake and Dr. Kwan were gathering Luna and her older brother, Logan, for a birthday party. The children, crowded at the door, put on their socks and shoes. Dr. Lake stopped the recording on Luna's camera and handed her a pair of furry white gloves with sheep faces. “What are those, Luna?” she asked.
“Baa baa,” Luna said.
Dr Kwan said: “There was a time when she didn't know the word 'no' and it was just 'yes' to everything.” He turned to Luna: “Kisses, do you want kisses?”
“No,” Luna said.
“Oh,” Dr. Lake said, laughing. “I miss the 'yes' phase.”
Audio produced by sara diamond.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>