The field of Generative Artificial Intelligence is getting all the attention it deserves. Recent developments in text-to-image (T2I) personalization have opened up exciting possibilities for innovative uses. The concept of personalization, which is the generation of distinctive personas in various contexts and styles while preserving a high level of integrity of their identities, has become a prominent topic in generative AI. Face customization, the ability to generate new photos with various styles of a given face or person, has been made possible by the use of pre-trained broadcast models that have strong backgrounds in various styles.
Current approaches like DreamBooth and comparable techniques are successful due to their ability to include new themes in the model without detracting from its prior knowledge and to maintain the essence and details of the theme, even when it is presented in very different ways. But it still has many limitations, including problems with the size of the model and its training speed. DreamBooth involves tuning all the UNet and Text Encoder weights of the broadcast model, leading to a size of more than 1 GB for a stable broadcast, which is significantly large. Also, the training procedure for Stable Diffusion takes about 5 minutes, which may prevent its widespread adoption and practical application.
To overcome all these problems, a team of researchers from Google Research introduced HyperDreamBooth, which is a hypergrid that efficiently generates a small set of custom weights from a single image of a person. With just one image of a person, the HyperDreamBooth hypergrid effectively creates a small collection of custom weights. The diffusion model is then combined with these unique weights, which go through quick adjustments. The end result is a powerful system that can generate a person’s face in a variety of situations and aesthetics while maintaining fine subject detail and essential diffusion model understanding of various aesthetic and semantic alterations.
HyperDreamBooth’s incredible speed is one of its greatest achievements. It is 25 times faster than DreamBooth and 125 times faster than another related technology called Textual Inversion to customize faces in just 20 seconds. In addition, while maintaining the same degree of quality and aesthetic variation as DreamBooth, this quick customization procedure only requires a reference image. HyperDreamBooth also excels in terms of model size as well as speed. The resulting custom model is 10,000 times smaller than a regular DreamBooth model, which is a substantial advantage, making the model more manageable and significantly reducing storage requirements.
The team has summarized their contributions as follows:
- Lightweight DreamBooth (LiDB): Introduced a custom text-to-image model with a custom part of approximately 100KB, which was achieved by training the DreamBooth model on a low-dimensional weight space generated by a random orthogonal incomplete basis within a space of low range tuning weight.
- New HyperNetwork architecture: Using LiDB configuration, HyperNetwork generates custom weights for specific topics in a text-to-image broadcast model. This provides strong directional initialization, allowing for rapid fine tuning to achieve high subject fidelity in a few iterations. This method is 25 times faster than DreamBooth with comparable performance.
- Relaxed-range fine-tuning: The relaxed-range fine-tuning technique has been proposed, relaxing the range of a LoRA DreamBooth model during optimization to improve subject fidelity. This allows initialization of the custom model with an initial approximation of HyperNetwork and then refinement of the high-level theme details using relaxed range fine tuning.
review the Paper and project page. Don’t forget to join our 26k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out over 800 AI tools at AI Tools Club
Tanya Malhotra is a final year student at the University of Petroleum and Power Studies, Dehradun, studying BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.