Despite enormous advances in the last ten years, 3D facial reconstruction from a single unrestricted image remains a major research topic with a vibrant computer vision community. Its uses are now numerous and diverse, including, but not limited to, human digitization for virtual and augmented reality applications, social networking and gaming, the generation of synthetic data sets, and health applications. Recent studies, however, often need to produce components that can be used for photorealistic rendering and fail to accurately recreate the identities of various people.
3D morphable models (3DMM) are a popular method of obtaining the shape and appearance of the face from a single “in the wild” shot. This can be attributed to several factors, including the need for comprehensive scanned human geometry and reflectance data sets, the limited and confounding information found in a single facial image, and the limitations of current statistical and machine learning methods. Principal component analysis (PCA) was used in the initial 3DMM study to model face shape and appearance with variable identity and expression, which were learned from more than 200 participants.
Since then, more complex models comprising thousands of people have been developed, such as LSFM, Basel Face Model, and Facescape. In addition, 3DMMs of complete human heads or other facial features, including ears and tongue, have recently been developed. Finally, subsequent publications have included expansions ranging from 3DMM direct parameter regression to nonlinear models. Such models, however, cannot create photorealistic textures. Deep generative models have witnessed significant advances over the last ten years. Progressive GAN architectures, in particular, have produced outstanding results in learning distributions of high-resolution 2D photographs of human faces using generative adversarial networks (GANs).
Recently, significant latent regions have been learned that can be traversed to reconstruct and control various aspects of samples produced using style-based progressive generative networks. Some techniques, such as UV mapping, have also successfully acquired a 2D representation of 3D facial features. To produce 2D facial images, rendering functions can use 3D facial models produced by 3DMM. Iterative optimization also requires differentiating the rendering process. Recent developments in photorealistic differentiable rendering of such assets are made possible by differentiable rasterization, photorealistic facial shading, and rendering libraries.
Unfortunately, the Lambertian shading model used in the 3DMM work fails to accurately represent the complexity of face reflectance. The problem is that more than a single RGB texture is needed for realistic facial rendering, which requires multiple facial reflectance factors. Although recent attempts have been made to simplify such configurations, such data sets are few, small, and difficult to acquire. High-fidelity, relightable facial reflectance reconstructions have been made possible by various modern methods, including infrared. However, these reconstructions still need to be discovered. Furthermore, it has been shown that strong models can capture facial gazes using deep models, but cannot display single or multiple image reconstructions.
In a contemporary alternative paradigm that relies on learned neural interpretation, implicit representations capture the appearance and form of the avatar. Despite their excellent performance, standard renderers cannot employ such implicit renderings and usually cannot be turned back on. The most current Albedo Morphable model (AlbedoMM) also uses a linear PCA model to record facial shape and reflectance. Still, the per-vertex color and normal reconstruction are too low-resolution for photorealistic rendering. From a single “in the wild” photograph, AvatarMe++ can reconstruct high-resolution texture maps of facial reflectance. However, the three steps of the process (reconstruction, upsampling, and reflectance) cannot be directly optimized with the input image.
Researchers at Imperial College London present FitMe, which is a fully renderable 3DMM that can be fit into free facial images using accurate differentiable representations based on high-resolution facial reflectance texture maps. FitMe establishes identity similarities and produces fully renderable, highly realistic reconstructions that can be used immediately using commercially available rendering programs. The texture model is built as a progressive generator based on multimodal styles that simultaneously creates the surface normals of the face, the specular albedo, and the diffuse albedo. A carefully designed branch discriminator allows for easy training with various statistical modalities.
They optimize AvatarMe++ on the publicly available MimicMe dataset to create a capture-quality facial reflectance dataset of 5k people, which they further modify to balance skin tone representation. A head and face PCA model, trained on sizable geometric data sets, are used interchangeably for the form. They create a style-based generator projection and a 3DMM fit-based single or multi-image adjustment approach. The rendering function needs to be differentiable and fast to perform effective iterative tuning (less than a minute), rendering models like traceroute useless. Previous research has relied on slower optimization or simpler shading models (such as the Lambertian).
They improve on previous work by adding shading that has a more realistic appearance and has compelling diffuse and specular rendering that can take shape and reflectance for photorealistic rendering in common rendering engines (Fig. 1). FitMe can reconstruct high-fidelity facial reflectance and achieve remarkable identity similarity while accurately capturing features in diffuse, specular, and normal albedo due to the flexibility of the generator’s expanded latent space and photorealistic adjustment.
Figure 1: FitMe uses a reflectance model and differentiable rendering to reconstruct reflectance maps and relightable shapes for facial avatars from a single (left) or multiple (right) unrestricted face images. In typical engines, findings can be displayed in photorealistic detail.
In general, in this work, they present the following:
• The first 3DMM capable of producing high-resolution facial shape and reflectance, with ever-increasing level of detail, that can be rendered photorealistically.
• A technique to acquire and increase
• The first branched multimodal style-based progressive generator of high-resolution 3D facial assets (diffuse albedo, specular albedo, and normals), as well as a suitable multimodal branched discriminator
review the Paper and project page. Don’t forget to join our 22k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out 100 AI tools at AI Tools Club
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.