One of the biggest challenges in machine learning is modeling complex probability distributions. Probabilistic DPM diffusion models aim to learn the inverse of a well-defined stochastic process that progressively destroys information.
Image synthesis, video production, and 3D editing are some of the areas where probabilistic denoising diffusion models (DDPM) have proven their worth. As a result of their large parameter sizes and frequent inference steps per image, state-of-the-art DDPMs incur high computational costs. In reality, not all users have access to sufficient financial means to cover the cost of computing and storage. Therefore, it is crucial to investigate strategies to effectively customize large, publicly available, pre-trained diffusion models for individual applications.
A new study by researchers at Huawei Noah’s Ark Lab uses the diffusion transformer as a base and offers DiffFit, a simple and effective fine-tuning technique for large diffusion models. Recent NLP research (BitFit) has shown that adjusting the bias term can fine-tune a previously trained model for downstream tasks. The researchers wanted to adapt these effective adjustment strategies for imaging. First, they immediately apply BitFi, and to improve scaling and generalizability of features, they add scale factors that can be learned to particular layers of the model, with a default value of 1.0 and dataset-specific settings. Empirical results indicate that the inclusion of strategic locations throughout the model is crucial for improving the Frechet Starting Distance (FID) score.
BitFit, AdaptFormer, LoRA, and VPT are just a few of the efficient parameter fine-tuning strategies the team used and compared across 8 subsequent data sets. Regarding the number of trainable parameters and FID compensation, the findings show that DiffFit performs better than these other techniques. In addition, the researchers also found that their DiffFit strategy could easily be employed to fine-tune a low-resolution diffusion model, allowing it to adapt to producing high-resolution images at low cost simply by treating the high-resolution images as a separate model. dominance of low-resolution ones.
DiffFit outperformed previous state-of-the-art diffusion models on ImageNet 512 × 512 by starting with a pretrained ImageNet 256 × 256 checkpoint and fitting DITs for only 25 epochs. DiffFit outperforms the original DiT-XL/2-512 model (which has 640 million trainable parameters and 3 million iterations) in terms of FID and only has about 0.9 million trainable parameters. It also requires 30% less time to train.
Overall, DiffFit seeks to provide insight into efficient fine tuning of larger diffusion models by establishing a simple and powerful baseline for efficient parameter fine tuning in imaging production.
review the Paper. Don’t forget to join our 19k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out 100 AI tools at AI Tools Club
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech at the Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a strong interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new advances in technology and its real life application.