Cade Metz has been writing about advances in artificial intelligence for more than a decade.
Ian Sansavera, a software architect at a New York startup called Runway AI, wrote a short description of what he wanted to see in a video. “A calm river in the forest”, he wrote himself.
Less than two minutes later, an experimental Internet service generated a short video of a calm river in a forest. The rushing river water glistened in the sun as it cut through trees and ferns, turned a corner, and splashed gently over the rocks.
Runway, which plans to open its service to a small group of testers this week, is one of several companies developing artificial intelligence technology that will soon allow people to generate videos simply by typing various words into a box on a computer screen.
They represent the next stage in an industry race, which includes giants like Microsoft and Google, as well as much smaller startups, to create new kinds of AI systems that some believe could be the next big thing in technology. , just as important as web browsers or the iPhone.
New video generation systems could speed up the work of filmmakers and other digital artists, while also becoming a fast new way to create hard-to-detect misinformation online, making it even harder to know what’s real online. Internet.
The systems are examples of what is known as generative AI, which can instantly create text, images and sounds. Another example is ChatGPT, the online chatbot created by a San Francisco startup, OpenAI, which surprised the tech industry with its abilities late last year.
Google and Meta, the parent company of Facebook, unveiled the first video generation systems last yearbut did not share them with the public because they were concerned that the systems could eventually be used to spread disinformation with new speed and efficiency.
But Runway CEO Cris Valenzuela said he believed the technology was too important to keep in a research lab, despite its risks. “This is one of the most impressive technologies that we have built in the last hundred years,” he said. “You need people to actually use it.”
The ability to edit and manipulate movies and videos is nothing new, of course. Filmmakers have been doing it for over a century. In recent years, researchers and digital artists have been using various artificial intelligence technologies and software programs to create and edit videos that are often called deep fake videos.
But systems like the one Runway has created could, over time, replace editing skills at the push of a button.
A new generation of chatbots
A brave new world. A new crop of AI-powered chatbots has kicked off a fight to determine if the technology could change the internet economy, turning current powerhouses into past ones and creating the next industry giants. Here are the bots to know:
ChatGPT. ChatGPT, a research lab’s artificial intelligence language model, OpenAI, has been making headlines since November for its ability to answer complex questions, write poetry, generate code, plan vacations, and translate languages. GPT-4, the latest version released in mid-March, can even respond to images (and pass the uniform bar exam).
bing. Two months after ChatGPT’s debut, Microsoft, OpenAI’s main investor and partner, added a similar chatbot, capable of having open text conversations on virtually any topic, to its Bing Internet search engine. But it was the bot’s occasionally inaccurate, misleading, and bizarre responses that garnered much of the attention after its release.
Ernie. Search giant Baidu unveiled China’s first major challenger to ChatGPT in March. Ernie’s debut, short for Enhanced Rendering Through Knowledge Integration, turned out to be a flop after it was revealed that a promised “live” demo of the bot had been recorded.
Runway’s technology generates videos from any short description. To get started, simply write a description the same way you would write a quick note.
That works best if the scene has some action, but not too much, something like “a rainy day in the big city” or “a dog with a cell phone in the park.” You hit enter and the system generates a video in a minute or two.
The technology can reproduce common images, such as a cat sleeping on a rug. Or you can combine disparate concepts to generate weirdly funny videos, like a cow at a birthday party.
The videos are only four seconds long, and the video is choppy and blurry if you look closely. Sometimes the images are strange, distorted and disturbing. The system has a way of merging animals like dogs and cats with inanimate objects like balls and cell phones. But given the right cue, it produces videos that show where the technology is headed.
“At this point, if I see a high-resolution video, I’m probably going to trust it,” said Phillip Isola, a professor at the Massachusetts Institute of Technology who specializes in AI. “But that will change pretty quickly.”
Like other generative AI technologies, Runaway’s system learns by analyzing digital data — in this case, photos, videos, and captions that describe what those images contain. By training this type of technology on ever-increasing amounts of data, researchers are confident that they can rapidly improve and expand their skills. Experts believe that they will soon generate professional-looking mini-movies, complete with music and dialogue.
It is difficult to define what the system currently creates. It is not a photo. It is not a cartoon. It is a collection of many pixels combined to create a realistic video. The company plans to offer its technology with other tools that it believes will speed up the work of professional artists.
Last month, social media services were awash with images of Pope Francis in a white puffer coat by Balenciaga, a surprisingly modern outfit for an 86-year-old pontiff. But the images were not real. A 31-year-old construction worker from Chicago had created the viral sensation. using a popular AI tool called Midjourney.
Dr. Isola has spent years building and testing this type of technology, first as a researcher at the University of California, Berkeley and at OpenAI, and then as a professor at MIT. Still, he was fooled by the sharp, high-resolution but Completely false images of Pope Francis.
“There was a time when people would post deepfakes and I wasn’t fooled, because they were either too outlandish or not very realistic,” he said. “Now, we can’t take any of the images we see on the internet at face value.”
Midjourney is one of many services that can generate realistic still images from a short notice. Others include Stable Diffusion and DALL-E, an OpenAI technology that started this wave of photogenerators when it was introduced a year ago.
Midjourney is based on a neural network, which learns its skills by analyzing huge amounts of data. Look for patterns as you review millions of digital images, as well as text captions that describe what each image represents.
When someone describes an image for the system, they generate a list of features that the image might include. One feature might be the curve at the top of a dog’s ear. Another might be the edge of a cell phone. A second neural network, called a diffusion model, then creates the image and generates the pixels needed for the functions. It eventually transforms the pixels into a coherent image.
Companies like Runway, which has about 40 employees and has raised $95.5 million, use this technique to generate moving images. By analyzing thousands of videos, its technology can learn to stitch together many still images in a similar coherent way.
“A video is just a series of frames, still images, that are combined in a way that gives the illusion of movement,” Valenzuela said. “The trick is to train a model that understands the relationship and consistency between each frame.”
Like early versions of tools like DALL-E and Midjourney, the technology sometimes combines concepts and images in curious ways. If you ask for a teddy bear that plays basketball, he might give you some kind of mutant stuffed animal with a basketball for a hand. If you ask for a dog with a cell phone in the park, you might get a puppy with a cell phone and a strangely human body.
But experts believe they can fix the flaws as they train their systems with more and more data. They believe technology will finally make video creation as easy as writing a sentence.
“In the old days, to do something remotely like this, you had to have a camera. You had to have accessories. You had to have a location. You had to have permission. You had to have money,” said Susan Bonser, a Pennsylvania author and editor who has been experimenting with early incarnations of generative video technology. “You don’t have to have any of that now. You can just sit back and imagine it.”