Posture, appearance, facial expression, hand gestures, etc., collectively referred to as “body language”, have been the subject of much academic research. Accurately recording, interpreting, and creating non-verbal cues can greatly improve the realism of avatars in telepresence, augmented reality (AR), and virtual reality (VR) environments.
Existing state-of-the-art avatar models, such as those from the SMPL family, can correctly represent different human body shapes in realistic positions. Still, they are limited by the mesh-based renderings they use and the quality of the 3D mesh. Also, these models often only simulate nude bodies and show no clothing or hair, which reduces the realism of the results.
They present X-Avatar, an innovative model that can capture the full range of human expression in digital avatars to create realistic telepresence, augmented reality and virtual reality environments. X-Avatar is an expressive implicit human avatar model developed by researchers at ETH Zurich and Microsoft. It can capture high-fidelity human hand and body movements, facial emotions, and other appearance features. The technique can learn from full 3D scans or RGB-D data, producing comprehensive models of bodies, hands, facial emotions, and appearance.
The researchers propose a partial learning skinning module that can be controlled by the SMPL-X parameter space, allowing expressive animation of X-Avatars. The researchers present unique part-aware initialization and sampling algorithms to train neural shape and deformation fields effectively. Researchers augment the geometry and deformation fields with a network of textures conditioned by position, facial expression, geometry, and deformed surface normals to capture the appearance of the avatar in high-frequency detail. This produces better fidelity results, particularly for smaller body parts, while maintaining training effectiveness despite the increasing number of articulating bones. The researchers empirically demonstrate that the approach achieves superior quantitative and qualitative results on the animation task compared to strong baselines in both data areas.
The researchers present a new dataset, called X-Humans, with 233 high-quality textured scan sequences from 20 subjects, for 35,500 data frames to aid future research on expressive avatars. X-Avatar suggests a human model characterized by articulated neural implicit surfaces that accommodate the diverse topology of clothed individuals and achieve improved geometric resolution and higher overall appearance fidelity. The study authors define three distinct neural fields: one to model geometry using an implicit occupancy network, another to model deformation using linear blend skinning (LBS) learned with continuous skinning weights, and a third to model appearance. using the RGB color value.
The X-Avatar model can take a posed 3D scan or RGB-D image for processing. Part of its design incorporates a modeling network to model geometry in canonical space and a deformation network that uses learned linear shuffling (LBS) skinning to create correspondences between canonical and deformed areas.
The researchers start with the parameter space of SMPL-X, an extension of SMPL that captures the shape, appearance, and deformations of full-body people, paying special attention to hand positions and facial emotions to generate human avatars. expressive and controllable. A human model described by articulated neural implicit surfaces represents the various topologies of clothed individuals. At the same time, a unique part-aware initialization method greatly improves the realism of the result by increasing the sample rate for smaller body parts.
The results show that X-Avatar can accurately record human body and hand poses, as well as facial emotions and appearance, making it possible to create more expressive and realistic avatars. The group behind this initiative crosses their fingers that their method can inspire more studies to give AIs more personality.
Data set used
High-quality textured scans and SMPL[-X] records; 20 subjects; 233 sequences; 35,427 frames; body position + hand gesture + facial expression; a wide range of clothing and hairstyle options; a wide range of ages
Characteristics
- There are several methods to teach X-Avatars.
- Image of 3D scans used in training, top right. At the bottom: avatars based on test poses.
- RGB-D information for instructional purposes, above. Pose test avatars perform at a lower level.
- Focus brings back more hand articulation and facial expression than other baselines in the animation test. This results in animated X-Avatars using motions retrieved by PyMAF-X from monocular RGB movies.
limitations
The X-Avatar has difficulty modeling off-the-shoulder tops or pants (eg, skirts). However, researchers often only train a single model per subject, so their ability to generalize beyond a single individual has yet to be expanded.
contributions
- X-Avatar is the first expressive implicit human avatar model that comprehensively captures body posture, hand posture, facial emotions, and appearance.
- Initialization and sampling procedures that take the underlying structure into account increase the quality of the results and maintain training efficiency.
- X-Humans is a new dataset of 233 sequences totaling 35,500 frames of high-quality textured scans of 20 people displaying a wide range of hand and body movements and facial emotions.
X-Avatar is unrivaled in capturing body posture, hand posture, facial emotions, and overall look. Using the recently published X-Humans dataset, the researchers have shown that the method
review the Paper, Project, and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 16k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Dhanshree Shenwai is a Computer Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking domain with strong interest in AI applications. She is enthusiastic about exploring new technologies and advancements in today’s changing world, making everyone’s life easier.