With the introduction of large language models and their increasing popularity, a number of tasks are being executed conveniently. Models like DALL-E, developed by OpenAI, are already being used by more than a million users. It is a text-to-image generation model that generates high-quality images based on the textual description entered. The diffusion models behind these generative LLMs allow the user to easily produce an image from text by iteratively modifying and updating the variables that represent the image. In addition to this functionality, some models are also being used to generate an image from an image. These models edit an image to produce the required target image while maintaining many fine details.
It has become possible to generate an image from an image, but it is still difficult to reconstruct a two-dimensional image into a three-dimensional one. This is because it is difficult to recover enough information from a single image that would be necessary to produce a 3D image. A research team from the University of Oxford has presented a new diffusion model capable of generating 360-degree reconstructions of different objects from a single image. Called RealFusion, this model overcomes the challenge of 360-degree photographic rendering, as traditional approaches believe that without access to multiple views, reconstruction is not possible.
The team has used a neural radiation field to extract 3D information from an existing 2D model by expressing the 3D geometry and appearance of the image. They have optimized the radiation field taking into account two main objectives:
- Reconstruction target: This has been used to ensure that the radiation field mimics the input image fed. This objective is from the point of view of the field.
- Scoring Distillation Sampling (SDS): This is an earlier SDS-based goal that has been used to ensure that the object samples produced by the diffusion model and its new views mimic the radiation field.
Researchers have used the idea of creating 3D images and constituting different views using prior knowledge from pre-trained diffusion models such as Stable Diffusion.
Some of the main contributions of the team are the following:
- RealFusion can extract a 360-degree photographic 3D reconstruction from a single image without regard to assumptions such as 3D monitoring or the type of object being photographed.
- RealFusion works by leveraging a 2D diffusion imager through a new variant of single image text inversion.
- The team has also introduced some new regularizers with their effective implementation using InstantNGP.
- RealFusion outperforms traditional methods by displaying state-of-the-art reconstruction results on multiple images from existing data sets and wild images.
RealFusion is a breakthrough in imaging as it accommodates the mastery of dimensions. Comparing RealFusion with currently existing approaches, it showed better quality of the images produced along with better shape, appearance and extrapolation features. It is certainly a great addition to the category of diffusion models.
review the Paper, Github, and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 14k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Tanya Malhotra is a final year student at the University of Petroleum and Power Studies, Dehradun, studying BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.