Meet TextDeformer: An AI Framework for Text-Driven 3D Mesh Deformation

Three-dimensional (3D) meshes are a major component of computer graphics and 3D modeling and have various fields of application, including architecture, automotive design, game development, and film production. A mesh is a digital representation of a three-dimensional object that comprises a collection of vertices, edges, and faces that define its shape and structure. The vertices represent the points in space where the edges meet, while the faces define the surface of the object.

Since creating 3D meshes is challenging, it is generally reserved for experts with special artistic skills. This implies that it would be difficult for a person to create 3D meshes from scratch without this knowledge. The Internet makes it possible to find diverse data sets with 3D objects created by digital artists. However, when customization (even minimal) is required, the editing process is just as arduous as the simple creation.

For this reason, the mesh deformation problem is a topic that has received much attention in computer graphics and geometry processing. In many existing AI techniques, a user can manipulate warps via control handles, allowing coarse low-frequency warps that preserve detail. These are commonly known as detail-preserving deformations. However, in 3D modeling, it is often necessary to incorporate fine geometric information, which can be time consuming and complicated, even for skilled artists.

🚀 JOIN the fastest ML subreddit community

In this sense, a novel AI approach, called TextDeformer, has been proposed to automate the process of deforming 3D meshes. The TextDeformer aims to transform a given source shape into a desired target shape, while maintaining semantic consistency between the two. The following is an overview of the workflow and system architecture.

This approach builds on the success of recent generative text-guided and NeRF (Neural Radiance Fields) techniques, but does not require 3D training data. Instead, the authors use differentiable rendering with pre-trained image encoders like CLIP to adjust and optimize the geometry of rendered objects.

After deformation, the structure and properties of the source mesh are preserved and the resulting geometry adheres to the text specifications. This work differs from the previous ones in the type of task that the model performs. Unlike previous text-driven jobs that generate geometry from scratch or add detail while preserving the geometry of the input mesh, TextDeformer focuses on the deformation task.

In detail, this framework is designed to modify an existing input shape to create high-quality geometry that accurately reflects the source mesh. In addition, it can produce low-frequency shape changes and high-frequency detail, such as elongating a cow’s neck when warping into a giraffe, or adding scales when warping into an alligator. The authors insist that the resulting mappings from source to target shape are continuous and semantically significant (eg, “leg deforms into leg”) coloring the source mesh, which is visible in all visualizations.

Some examples of the produced results reported by the authors of this paper are illustrated in the figure below. Additionally, this figure includes a comparison between TextDeformer and the next generation DreamFusion.

This was the brief for TextDeformer, a new AI framework that enables precise text-driven 3D mesh deformation. If you are interested, you can learn more about this technique at the links below.

review the Paper. Don’t forget to join our 20k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]

🚀 Check out 100 AI tools at AI Tools Club

Daniele Lorenzi received his M.Sc. in ICT for Internet and Multimedia Engineering in 2021 from the University of Padua, Italy. He is a Ph.D. candidate at the Institute of Information Technology (ITEC) at the Alpen-Adria-Universität (AAU) Klagenfurt. He currently works at the Christian Doppler ATHENA Laboratory and his research interests include adaptive video streaming, immersive media, machine learning and QoS / QoE evaluation.