World models (ai algorithms capable of generating a simulated environment in real time) represent one of the most impressive applications of machine learning. In the last year, there has been a lot of movement in this field and, to that end, Google DeepMind announced Genius 2 on Wednesday. While its predecessor was limited to generating 2D worlds, the new model can create 3D worlds and maintain them for much longer.
Genie 2 is not a game engine; instead, it is a diffusion model that generates images as the player (whether a human or another ai agent) moves through the world the software is simulating. As you generate frames, Genie 2 can infer ideas about the environment, giving you the ability to model water, smoke, and physics effects, although some of those interactions can be a lot of fun. The model is also not limited to rendering scenes from a third-person perspective, but can also handle isometric and first-person viewpoints. All you need to get started is a single image, provided by Google's Image 3 model, or an image of something from the real world.
<div class="twitter-tweet-wrapper” data-embed-anchor=”ead345c7-330c-5b89-bb23-634be381076a”><blockquote placeholder="" data-theme="light" class="twitter-tweet”>
Introducing Genie 2: our ai model that can create an infinite variety of playable 3D worlds, all from a single image.
These types of large-scale foundational global models could allow future agents to be trained and evaluated in a multitude of virtual environments. →… <a target="_blank" href="https://t.co/qHCT6jqb1W" rel="nofollow noopener" target="_blank" data-ylk="slk:pic.twitter.com/qHCT6jqb1W;elm:context_link;itc:0;sec:content-canvas” class=”link “>pic.twitter.com/qHCT6jqb1W
-Google DeepMind (@GoogleDeepMind) <a target="_blank" href="https://twitter.com/GoogleDeepMind/status/1864367798132039836?ref_src=twsrc%5Etfw” rel=”nofollow noopener” target=”_blank” data-ylk=”slk:December 4, 2024;elm:context_link;itc:0;sec:content-canvas” class=”link “>December 4, 2024
In particular, Genie 2 can remember parts of a simulated scene even after they leave the player's field of view and can accurately reconstruct those elements once they become visible again. This contrasts with other global models such as <a target="_blank" data-i13n="cpos:3;pos:1" href="https://www.decart.ai/” rel=”nofollow noopener” target=”_blank” data-ylk=”slk:Oasis;cpos:3;pos:1;elm:context_link;itc:0;sec:content-canvas” class=”link “>Oasis
However, there are even limitations to what the Genie 2 can do in this regard. DeepMind says the model can generate “consistent” worlds for up to 60 seconds, and most of the examples the company shared on Wednesday last much shorter; In this case, most videos are between 10 and 20 seconds long. Additionally, artifacts are introduced and image quality softens the longer Genie 2 takes to maintain the illusion of a consistent world.
DeepMind did not detail how it trained Genie 2, other than to say it was based “on a large-scale video dataset.” Don't expect DeepMind to release Genie 2 to the public anytime soon, either. At the moment, the company sees the model primarily as a tool for training and evaluating other ai agents, including its own SIMA algorithm, and something that artists and designers could use to quickly prototype and test ideas. In the future, DeepMind suggests that global models like Genie 2 will likely play an important role in the path to artificial general intelligence.
“Training more general embodied agents has traditionally been hampered by the availability of sufficiently rich and diverse training environments,” DeepMind said. “As we show, Genie 2 could allow future agents to be trained and tested on an unlimited curriculum of novel worlds.”
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>