First, OpenAI offered a tool that allowed people to create digital images simply by describing what they wanted to see. He then built similar technology that generated full-motion video like something out of a Hollywood movie.
Now it has introduced technology that can recreate someone's voice.
The high-profile artificial intelligence startup said Friday that a small group of companies was testing a new OpenAI system, Voice Engine, that can recreate a person's voice from a 15-second recording. If you upload a recording of yourself and a paragraph of text, you can read the text using a synthetic voice that sounds like your own.
The text does not have to be in your native language. If you speak English, for example, you can recreate your voice in Spanish, French, Chinese, or many other languages.
OpenAI is not sharing the technology more widely because it is still trying to understand its potential dangers. Like image and video generators, a speech generator could help spread misinformation on social media. It could also allow criminals to impersonate people online or during phone calls.
The company said it was particularly concerned that this type of technology could be used to break voice authenticators that control access to online banking accounts and other personal applications.
“This is a delicate thing and it's important to get it right,” Jeff Harris, product manager at OpenAI, said in an interview.
The company is exploring ways to watermark synthetic voices or add controls that prevent people from using the technology with the voices of politicians or other prominent figures.
Last month, OpenAI took a similar approach when it introduced its video generator, Sora. He showed the technology but did not make it public.
OpenAI is among many companies that have developed a new generation of ai technology that can generate synthetic voices quickly and easily. These include tech giants like Google and startups like New York-based ElevenLabs. (The New York Times has sued OpenAI and its partner, Microsoft, over allegations of copyright infringement involving ai systems that generate text.)
Companies can use these technologies to generate audiobooks, voice online chatbots, or even create an automated radio station DJ. Since last year, OpenAI has used its technology to power a talking version of ChatGPT. And it has long offered businesses a variety of voices that can be used for similar applications. All of them were constructed from clips provided by voice actors.
But the company has not yet offered a public tool that allows individuals and companies to recreate voices from a short clip like Voice Engine does. The ability to recreate any voice in this way, Harris said, is what makes the technology dangerous. The technology could be particularly dangerous in an election year, he said.
In January, New Hampshire residents received robocall messages discouraging them from voting in the state primary in a voice that was likely artificially generated to sound like President Biden. The Federal Communications Commission later banned these types of calls.
Harris said OpenAI had no immediate plans to make money from the technology. He said the tool could be particularly useful for people who have lost their voice through illness or accident.
It demonstrated how technology had been used to recreate a woman's voice after she was damaged by brain cancer. She could now speak, she said, after providing a brief recording of a presentation she had given when she was a high school student.