A pair of Meta's glasses takes a photo when you say, “Hey, Meta, take a photo.” A miniature computer that clips to your shirt, the ai Pin, translates foreign languages into your native language. A screen with artificial intelligence has a virtual assistant that tech/” title=”” rel=”noopener noreferrer” target=”_blank”>speak through a microphone.
Last year, OpenAI updated its ChatGPT chatbot to respond with spoken words, and Google recently introduced Gemini, a replacement for its voice assistant on Android phones.
technology companies are betting on a renaissance of voice assistants, many years after most people decided that talking to computers was wrong.
Will it work this time? Maybe, but it could take a while.
Large swaths of people have still never used voice assistants like amazon's Alexa, Apple's Siri and Google Assistant, and the overwhelming majority of those who do said they never wanted to be seen speaking to them in public, according to studies made In the last decade.
I too rarely use voice assistants, and in my recent experiment with Meta glasses, which include a camera and speakers to provide information about the environment, I concluded that talking to a computer in front of parents and their children in a zoo was still astonishingly uncomfortable.
It made me wonder if this would ever feel normal. Not long ago, talking on the phone with Bluetooth headphones made people look crazy, but now everyone does it. Will we one day see many people walking and talking to their computers like in science fiction movies?
I posed this question to researchers and design experts, and the consensus was clear: Because new ai systems improve the ability of voice assistants to understand what we say and actually help us, we are likely to talk to devices with more frequently in the coming years. future, but there are still many years before this is done publicly.
This is what you should know.
Why voice assistants are getting smarter
The new voice assistants are powered by generative artificial intelligence, which uses complex statistics and algorithms to guess which words go together, similar to the autocomplete feature on your phone. That makes them better able to use context to understand requests and follow-up questions than virtual assistants like Siri and Alexa, which can only answer a finite list of questions.
For example, if you say to ChatGPT: “What are some flights from San Francisco to New York next week?” – and continue with “What's the weather like there?” and “What should I pack?” – the chatbot can answer those questions because it makes connections between words to understand the context of the conversation. (The New York Times sued OpenAI and its partner, Microsoft, last year for using copyrighted news articles without permission to train chatbots.)
An older voice assistant like Siri, which reacts to a database of commands and questions that it was programmed to understand, would fail unless specific words were used, such as “What's the weather like in New York?” and “What should I pack for a trip to New York?”
The first conversation sounds more fluid, like the way people talk to each other.
One of the main reasons people abandoned voice assistants like Siri and Alexa was that computers couldn't understand much of what they were asked and it was difficult to know which questions worked.
Dimitra Vergyri, director of speech technology at SRI, the research lab behind the initial version of Siri before it was acquired by Apple, said generative ai addresses many of the problems researchers had struggled with for years. The technology makes voice assistants capable of understanding spontaneous speech and responding with useful responses, she said.
John Burkey, a former Apple engineer who worked on Siri in 2014 and has been an outspoken critic of the assistant, said he believed that because generative ai made it easier for people to get help from computers, it was likely that More of us will be talking to assistants soon, and when many of us start doing so, that could become the norm.
“Siri was limited in size: it only knew a limited number of words,” he said. “Now you have better tools.”
But it could be years before the new wave of ai assistants are widely adopted because they introduce new problems. Chatbots, including ChatGPT, Google's Gemini, and Meta ai, are prone to “hallucinations,” which is when they make things up because they can't find the right answers. They have made mistakes in basic tasks such as counting and summarizing information from the web.
When voice assistants help and when they don't
Even as speech technology improves, speaking is unlikely to replace or surpass traditional computer interactions with a keyboard, experts say.
Today, people have compelling reasons to talk to computers in some situations when they are alone, such as setting a destination on a map while driving a car. In public, however, talking to an assistant can not only make you look weird, but most of the time it's impractical. When I was wearing Meta glasses in a grocery store and asked them to identify a product, an eavesdropping shopper cheekily responded, “That's a turnip.”
You also don't want to dictate a confidential work email to other people on a train. Likewise, it would be inconsiderate to ask a voice assistant to read text messages out loud in a bar.
“technology solves a problem,” said Ted Selker, a veteran in product design who worked at IBM and Xerox PARC. “When are we solving problems and when are we creating problems?”
But it's easy to find times when talking to a computer helps you so much that you don't care how strange it seems to others, said Carolina Milanesi, an analyst at Creative Strategies, a research firm.
As you walk to your next office meeting, it would be helpful to ask a voice assistant to tell you about the people you were about to meet. While walking on a trail, asking a voice assistant where to turn would be faster than stopping to open a map. While visiting a museum, it would be great if a voice assistant could give him a history lesson about the painting he is looking at. Some of these applications are already being developed with new ai technology.
When I was testing some of the latest voice-controlled products, I caught a glimpse of that future. While recording a video of me making a loaf of bread and wearing the Meta glasses, for example, it was helpful to be able to say, “Hey, Meta, take a video,” because my hands were full. And asking Humane's ai Pin to dictate my to-do list was more convenient than stopping to stare at my phone screen.
“While you're walking, that's the sweet spot,” said Chris Schmandt, who worked on voice interfaces for decades at the Massachusetts Institute of technology's Media Lab.
When he became an early adopter of an early cell phone about 35 years ago, he says, people stared at him as he wandered around the MIT campus talking on the phone. Now this is normal.
I am convinced that the day will come when people will occasionally talk to computers when they are away from home, but it will come very slowly.