In recent months, generative AI has become increasingly popular. From multiple organizations to AI researchers, everyone is discovering the enormous potential of generative AI to produce unique and original content. With the introduction of Large Language Models (LLM), a number of tasks are performed conveniently. More than a million users already use models like DALL-E, developed by OpenAI, which allows users to create realistic images from a text message. This text-to-image generation model generates high-quality images based on the textual description entered.
For the generation of three-dimensional images, OpenAI has recently launched a new project. Called Shap·E, this conditional generative model has been designed to generate 3D assets. Unlike traditional models that only produce a single output representation, Shap·E generates the parameters of implicit functions. These features can be represented as textured meshes or Neural Radiation Fields (NeRFs), allowing for versatile and realistic 3D asset generation.
While training Shap·E, the researchers first trained a coder. The encoder takes 3D assets as input and maps them to the parameters of an implicit function. This mapping allows the model to thoroughly learn the underlying representation of the 3D assets. Following that, a conditional diffusion model was trained using the encoder outputs. The conditional diffusion model learns the conditional distribution of the implicit function parameters given the input data, and therefore generates diverse and complex 3D assets by sampling the learned distribution. The diffusion model was trained using a large dataset of paired 3D assets and their corresponding textual descriptions.
Shap-E involves implicit neural representations (INR) for 3D renderings. Implicit neural renderings encode 3D assets by mapping 3D coordinates to location-specific information, such as density and color, to render a 3D asset. They provide a versatile and flexible framework for capturing detailed geometric properties of 3D assets. The two types of INR that the team has discussed are:
- Neural Radiation Field (NeRF): NeRF renders 3D scenes by mapping coordinates and displaying directions in RGB colors and densities. NeRF can be rendered from arbitrary viewpoints, allowing for a high-fidelity and realistic rendering of the scene, and can be trained to match actual renderings.
- DMTet and its GET3D extension: These INRs have been used to represent a textured 3D mesh by assigning coordinates to colors, signed distances, and vertex offsets. By using these functions, 3D triangular meshes can be constructed in a differentiable way.
The team has shared some examples of the Shap·E output, including 3D output for textual cues like a plate of food, a penguin, a voxelized dog, a campfire, a chair that looks like an avocado, etc. The resulting models trained with Shap·E have demonstrated the great performance of the model. You can produce high-quality results in just seconds. For its evaluation, Shap·E has been compared with another generative model called Point·E, which generates explicit representations on point clouds. Despite modeling a higher dimensional and multiple representation output space, Shap·E in comparison showed faster convergence and achieved comparable or better sample quality.
In conclusion, Shap·E is an effective and efficient generative model for 3D assets. It looks promising and is an important addition to the contributions of generative AI.
review the Research work, inference Code, and samples. Don’t forget to join our 20k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out 100 AI tools at AI Tools Club
Tanya Malhotra is a final year student at the University of Petroleum and Power Studies, Dehradun, studying BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.