Google Research Introduces MediaPipe FaceStylizer: An Efficient Design for Few-Shot Face Stylization

Researchers and consumers have shown increasing enthusiasm for smartphone applications that combine augmented reality (AR) in recent years. This allows users to generate and alter facial features in real time for short videos, VR, and games. Face generation and editing models based on generative adversarial network (GAN) approaches are popular since they are lightweight while maintaining excellent quality. Most GAN models, however, have severe limitations in terms of computing complexity and demand a huge training dataset. It is also crucial to make ethical use of GAN models.

Google researchers developed MediaPipe FaceStylizer as an effective solution for few-shot face stylization that considers these issues with model complexity and data efficiency. GAN inversion transforms the image into latent coding for the face generator in this model. To generate high-quality images at granularities ranging from coarse to fine, they introduce a mobile-friendly synthesis network for the face generator, complete with an auxiliary head that converts features to RGB at each generator level. Furthermore, they distilled the student generator from the teacher StyleGAN model, resulting in a lightweight model that maintains good generation quality by carefully designing the loss functions for the aforementioned auxiliary heads and combining them with the common GAN loss functions. MediaPipe provides open-source access to the proposed solution. MediaPipe Model Maker allows users to fine-tune the generator to learn a style from one or a few photographs. MediaPipe FaceStylizer will enable users to deploy the resulting model to on-device face stylization applications.

Faces in images and videos can be enhanced or created from scratch with the help of the MediaPipe Face stylizer task. This activity can make virtual characters with a wide range of aesthetic options.

The BlazeFaceStylizer model, which includes a face generator and face encoder, is used for this task. Lightweight implementation of the StyleGAN model family, BlazeStyleGAN, produces and refines faces to match a given aesthetic. Using a MobileNet V2 core, the face encoder associates input photos with the faces produced by the face generator.

The project aims to provide a pipeline that helps users fine-tune the MediaPipe FaceStylizer model to suit various styles. Researchers constructed a face stylization pipeline with a GAN inversion encoder and an effective face generator model (for more on this, see below). The encoder and generator pipeline can then be trained with a few examples from various styles. To begin, the user will send one or several representative samples of the desired aesthetic to MediaPipe ModelMaker. The encoder module is frozen during the fine-tuning procedure, and only the generator is adjusted. Several latent codes around the encoding output of the input style images are sampled to train the generator. Following this, a joint adversarial loss function is optimized to prepare the generator to rebuild a face image in the same aesthetic as the input style image. Thanks to this fine-tuning process, the MediaPipe FaceStylizer is flexible enough to accommodate the user’s input. This method can apply a stylization to test photos of actual human faces.

Researchers at Google use knowledge distillation to train the BlazeStyleGAN using the widely-used StyleGAN2 as the instructor model. Additionally, they train the model to generate better images by introducing a multi-scale perceptual loss to the learning process. BlazeStyleGAN has fewer parameters and simpler models than MobileStyleGAN. They benchmark BlazeStyleGAN on several mobile devices, showing that it can run at real-time speeds on mobile GPUs. BlazeStyleGAN’s output matches the visual quality of its teacher model very closely. They also note that BlazeStyleGAN can improve visual quality in some situations by reducing artifacts produced by the instructor model. Frechet Inception Distance (FID) results for BlazeStyleGAN are comparable to those of the instructor StyleGAN. The following is a summary of the contributions:

Researchers have created a mobile-friendly architecture by adding an additional UpToRGB head at each generator level and only using it during inference.
By computing a multi-scale perceptual loss using the auxiliary heads and an adversarial loss on real images, they enhance the distillation technique, leading to better image generation and lessening the impact of transferring artifacts from the instructor model.
The BlazeStyleGAN can produce high-quality images in real-time on various popular smartphones.

Google’s research team has introduced the world’s first StyleGAN model (BlazeStyleGAN) that can produce high-quality face photographs in real-time on the vast majority of premium smartphones. There is much room for exploration in efficient on-device generative models. To reduce the impact of the instructor model’s artifacts, they devise a refined architecture for the StyleGAN synthesis network and fine-tune the distillation technique. BlazeStyleGAN can achieve real-time performance on mobile devices in the benchmark because the model complexity has been drastically reduced.

Check out the Google Article. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.