Google's new video generation ai model Lumiere Use a new diffusion model called Space-Time-U-Net, or STUNet, which discovers where things are in a video (space) and how they move and change simultaneously (time). technology/2024/01/googles-latest-ai-video-generator-renders-implausible-situations-for-cute-animals/”>Ars Technique reports that this method allows Lumiere to create the video in a single process instead of stitching together smaller frames.
Lumiere begins by creating a base framework from the message. He then uses the STUNet framework to begin to approximate where objects will move within that frame to create more frames that flow into each other, creating the appearance of fluid motion. Lumiere also outputs 80 frames compared to Stable Video Diffusion's 25 frames.
Admittedly, I'm more of a copywriter than a video enthusiast, but the sizzle that Google published, along with a pre-print scientific paper, shows that ai video generation and editing tools have gone from uncanny valley to be almost realistic in just a few years. years. It also establishes Google's technology in the space already occupied by competitors such as Runway, Stable Video Diffusion or Meta's Emu. Runway, one of the first mass-market text-to-video platforms, launched Runway Gen-2 in March last year and began offering more realistic-looking videos. Catwalk videos also have difficulty portraying movement.
Google was kind enough to put clips and prompts on the Lumiere site, allowing me to post the same prompts on Runway for comparison. Here are the results:
Yes, some of the clips presented have a touch of artificiality, especially if you look closely at the skin texture or if the scene is more atmospheric. But Look at that turtle! It moves like a turtle would in water! It looks like a real turtle! I sent the Lumiere intro video to a friend who is a professional video editor. While he noted that “you can clearly tell it's not entirely real,” he thought it was impressive that if he hadn't told him it was ai, he would think it was CGI. (She also said, “He's going to take my job, isn't he?”)
Other models stitch together videos from generated keyframes where movement has already occurred (think drawings in a flip book), while STUNet allows Lumiere to focus on the movement itself based on where the generated content should be at a given moment in the video. .
Google hasn't been a major player in the text-to-video category, but it has slowly launched more advanced ai models and leaned toward a more multimodal approach. Its Gemini large language model will eventually bring imaging to Bard. Lumiere isn't available for testing yet, but it shows Google's ability to develop an ai video platform that's comparable (and possibly a little better) to generally available ai video generators like Runway and Pika. And just a reminder, this was where Google was with ai videos two years ago.
Beyond text-to-video generation, Lumiere will also enable image-to-video generation, stylized generation, which allows users to create videos with a specific style, cinemagraphs that animate only a portion of a video, and painting to mask an area. . of the video to change the color or pattern.
However, Google's Lumiere document noted that “there is a risk of misuse when creating false or harmful content with our technology, and we believe it is crucial to develop and apply tools to detect bias and malicious use cases to ensure a safe experience.” and fair.” use.” The authors of the article did not explain how this can be achieved.