The creative industries have witnessed a new era of possibilities with the advent of generative models: computational tools capable of generating text or images based on training data. Drawing inspiration from these advances, researchers from Stanford University, UC Berkeley, and Adobe Research have unveiled a novel model that can seamlessly insert specific humans into different scenes with stunning realism.
The researchers used a self-supervised training approach to train a diffusion model. This generative model converts “noise” into desired images by adding and then reversing the process of “destroying” the training data. The model was trained on videos showing humans moving within various scenes, selecting two frames at random from each video. The humans in the first frame were masked, and the model used the unmasked individuals in the second frame as a conditioning cue to reconstruct the individuals in the masked frame realistically.
The model learned to infer potential poses from the scene context through this training process, re-posed the person, and seamlessly integrated into the scene. The researchers found that her generative model worked exceptionally well at placing individuals in scenes, generating edited images that appeared highly realistic. The predictions of the feature model (perceived possibilities of actions or interactions within an environment) outperformed previously presented non-generative models.
The findings have significant potential for future research in perceived economic ability and related areas. They can contribute to advances in robotics research by identifying potential interaction opportunities. Furthermore, the practical applications of the model extend to the creation of realistic media, including images and videos. Integration of the model into creative software tools could improve image editing functionalities, supporting artists and media creators. In addition, the model could be incorporated into smartphone photo-editing apps, allowing users to easily and realistically insert people into their photos.
The researchers have identified several avenues for future exploration. His goal is to incorporate more controllability into the generated poses and explore the generation of realistic human movements within scenes instead of static images. In addition, they seek to improve the efficiency of the model and expand the focus beyond humans to encompass all objects.
In conclusion, the introduction of a new model by the researchers allows the realistic insertion of humans in the scenes. Taking advantage of generative models and self-monitored training, the model demonstrates impressive performance in enabling insight and has potential for various applications in the creative industries and robotics research. Future research will focus on refining and expanding the capabilities of the model.
review the Paper. Don’t forget to join our 22k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out 100 AI tools at AI Tools Club
Niharika is a technical consulting intern at Marktechpost. She is a third year student, currently pursuing her B.Tech from the Indian Institute of Technology (IIT), Kharagpur. She is a very enthusiastic individual with a strong interest in machine learning, data science, and artificial intelligence and an avid reader of the latest developments in these fields.