We see digital avatars everywhere, from our favorite chat apps to virtual marketing assistants on our favorite ecommerce websites. They are becoming more and more popular and are quickly integrated into our daily lives. You enter your avatar editor, select the color of the skin, the shape of the eyes, the accessories, etc. and you have one ready to imitate you in the digital world.
Building a digital avatar face manually and using it as a living emoji can be fun, but it only scratches the surface of what’s possible. The true potential of digital avatars lies in the ability to become a clone of our entire body. This type of avatar has become an increasingly popular technology in video games and virtual reality (VR) applications.
The generation of high fidelity 3D avatars requires expensive and specialized equipment. Therefore, we only see them used in a limited number of applications, like the professional actors we see in video games.
What if we could simplify this process? Imagine if you could generate a high fidelity 3D full body avatar just by using some nature captured videos. No professional equipment, no complicated setup of sensors to capture every little detail, just a camera and simple recording with a smartphone. This breakthrough in avatar technology could revolutionize many applications in virtual reality, robotics, video games, movies, sports, etc.
The moment has come. We have a tool that can generate high-fidelity 3D avatars from videos captured in nature. time to meet vid2avatar.
Vid2Avatar learns 3D human avatars from wild videos. You don’t need, without need, ground truth monitoring, previous data extracted from large data sets or any external segmentation modules. You just give it a video of someone and it will generate a robust 3D avatar for you.
Vid2Avatar has some clever tricks up its sleeve to get it done. The first thing to do is separate the human from the background of a scene and model it as a neural field. They solve the tasks of separation of scenes and reconstruction of surfaces directly in 3D. They model two separate neural fields to implicitly learn both the human body and the background. Typically this is a challenging task because you need to associate the human body with 3D points without relying on 2D segmentation.
The human body is modeled using a single, consistent temporal representation of human form and texture in canonical space. This representation is learned from deformed observations using a reverse mapping of a parametric body model. In addition, Vid2Avatar uses an optimization algorithm to adjust multiple parameters related to the background, human subject, and her poses to best fit the data available from a sequence of images or video frames.
To further improve the separation, Vid2Avatar uses a special technique to render the scene in 3D, where the human body is separated from the background in a way that makes it easy to analyze the movement and appearance of each separately. In addition, it uses novel objectives, such as focusing on having a clear boundary between the human body and the background, guiding the optimization process towards producing more accurate and detailed reconstructions of the scene.
In general, a global optimization approach is proposed for high-fidelity and robust human body reconstruction. This method uses the capture of videos in the wild without requiring more information. Carefully designed components achieve robust modeling, and in the end, we end up with 3D avatars that could be used in many applications.
review the Paper and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 15k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Ekrem Çetinkaya received his B.Sc. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. She wrote her M.Sc. thesis on denoising images using deep convolutional networks. She is currently pursuing a PhD. She graduated from the University of Klagenfurt, Austria, and working as a researcher in the ATHENA project. Her research interests include deep learning, computer vision, and multimedia networks.