Mesh representations of 3D scenery are essential for many applications, from AR/VR asset development to computer graphics. However, making these 3D assets is still laborious and requires a lot of skill. Recent efforts have used generative models, such as diffusion models, to effectively produce high-quality images from text in the 2D realm. These techniques successfully contribute to the democratization of content production by greatly reducing the obstacles to producing images that include the content chosen by the user. A new area of research has attempted to use comparable techniques to generate 3D models from text. However, current methods have drawbacks and need more generality of 2D text-to-image models.
Dealing with the scarcity of 3D training data is one of the main difficulties in creating 3D models, since 3D data sets are much smaller than those used in many other applications, such as 2D image synthesis. For example, methods that use 3D monitoring directly are often restricted to basic shape data sets, such as ShapeNet. Recent techniques overcome these data limitations by formalizing 3D creation as an iterative optimization problem in the image domain, enhancing the expressive potential of 3D text-to-image 2D models. The ability to produce arbitrary (neural) shapes from text is demonstrated by its ability to construct 3D objects stored in a radiation field representation. Unfortunately, expanding these techniques to produce room-sized 3D structures and textures can be challenging.
Making sure that the output is dense and cohesive in exterior viewpoints, and that these views include all necessary features, such as walls, floors, and furniture, is difficult when creating huge scenes. A mesh remains a preferred representation for various end-user activities, including affordable technology rendering. Researchers at TU Munich and the University of Michigan suggest a technique that extracts scene-scale 3D meshes from commercially available 2D text-to-image models to solve these drawbacks. His technique employs painting and monocular depth perception to iteratively create a scene. Using a depth estimation technique, they do the first mesh by creating an image from text and projecting it back in three dimensions. The model is then repeatedly rendered from new angles.
For each, they paint any gaps in the displayed images before merging the created content onto the mesh (Fig. 1a). Two key design factors for their iterative generation approach are how they select views and how they integrate the created scene material with the current geometry. Initially, they choose predetermined trajectory perspectives that will cover a significant portion of the scene material, and then adaptively select viewpoints to fill in the gaps. To produce smooth transitions when combining generated content with the mesh, they align the two depth maps and remove areas of the model with distorted textures.
Combined, these options provide sizable scene-scale 3D models (Fig. 1b) that can represent a variety of rooms and have attractive materials and uniform geometry. So, their contributions are the following:
• A technique that uses 2D text-to-image models and monocular depth estimation to raise frames to 3D in iterative scene creation.
• A method that creates 3D meshes of room-scale interior scenes with beautiful textures and geometry from any text input. They can produce perfect, distortion-free geometries and textures using the suggested depth alignment and mesh blending methods.
• A custom two-stage perspective selection showing camera poses from ideal angles to first design furniture and area layout, then fill in gaps to provide an airtight mesh.
review the Paper, Projectand Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 16k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.