Recent advances in diffusion models have significantly improved tasks such as image, video, and 3D generation, with pre-trained models such as Stable Diffusion being instrumental. However, adapting these models to new tasks efficiently remains a challenge. Existing fine-tuning approaches (additive, reparameterized, and selective) have limitations such as additional latency, overfitting, or complex parameter selection. One proposed solution involves taking advantage of “temporally ineffective” parameters (those with minimal current impact but the potential to learn new information) by re-enhancing them to improve the generative capabilities of the model without the drawbacks of existing methods.
Researchers from Shanghai Jiao Tong University and Youtu Lab, Tencent, propose SaRA, a fine-tuning method for pre-trained diffusion models. Inspired by model pruning, SaRA reuses “temporally inefficient” parameters with small absolute values by optimizing them using sparse matrices while preserving prior knowledge. They employ a nuclear norm-based low-rank training scheme and a progressive parameter tuning strategy to avoid overfitting. SaRA’s memory-efficient non-structural backpropagation reduces memory costs by 40% compared to LoRA. Experiments on stable diffusion models show SaRA’s superior performance on multiple tasks, requiring only a single line of code modification to implement.
Diffusion models, such as Stable Diffusion, are excellent for imaging tasks but are limited by their large parameter sizes, making complete fine-tuning difficult. Methods such as ControlNet, LoRA, and DreamBooth address this problem by adding external networks or performing fine-tuning to enable controlled generation or adaptation to new tasks. Fine-tuning approaches that use parameters efficiently, such as Addictive Fine-Tuning (AFT) and Reparameterized Fine-Tuning (RFT), introduce low-rank matrices or adapters. At the same time, Selective Fine-Tuning (SFT) focuses on modifying specific parameters. SaRA improves on these methods by reusing inefficient parameters, maintaining model architecture, reducing memory costs, and improving fine-tuning efficiency without additional inference latency.
In diffusion models, “ineffective” parameters, identified by their small absolute values, show minimal impact on performance when pruned. Experiments on stable diffusion models (v1.4, v1.5, v2.0, v3.0) revealed that setting parameters below a certain threshold to zero sometimes even improves generative tasks. The ineffectiveness is due to the randomness of the optimization, not the model structure. Fine-tuning can make these parameters effective again. SaRA, a method, leverages these temporarily ineffective parameters for fine-tuning, using low-rank constraints and progressive fine-tuning to avoid overfitting and improve efficiency, significantly reducing memory and computational costs compared to existing methods such as LoRA.
The proposed method was evaluated on tasks such as backbone fine-tuning, image personalization, and video generation using FID, CLIP, and VLHI metrics. It outperformed existing fine-tuning approaches (LoRA, AdaptFormer, LT-SFT) on all datasets, showing superior task-specific learning and prior preservation. Image and video generation achieved better consistency and avoided artifacts. The method also reduced memory usage and training time by more than 45%. Ablation studies highlighted the importance of progressive parameter tuning and low-rank constraints. Correlation analysis revealed more effective knowledge acquisition than other methods, which improved task performance.
SaRA is a fine-tuning method that efficiently uses parameters and takes advantage of the lowest-impact parameters on pre-trained models. By using a nuclear-norm-based low-rank loss, SaRA avoids overfitting while its progressive parameter tuning improves fine-tuning efficiency. Unstructured backpropagation reduces memory costs, benefiting other selective fine-tuning methods. SaRA significantly improves generative capabilities on tasks such as domain transfer and image editing, outperforming methods such as LoRA. It only requires a one-line code modification for easy integration, demonstrating superior performance on models such as Stable Diffusion 1.5, 2.0, and 3.0 across multiple applications.
Take a look at the ModelAll credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our fact sheet..
Don't forget to join our SubReddit of over 50,000 ml
FREE ai WEBINAR: 'SAM 2 for Video: How to Optimize Your Data' (Wednesday, September 25, 4:00 am – 4:45 am EST)
Sana Hassan, a Consulting Intern at Marktechpost and a dual degree student at IIT Madras, is passionate about applying technology and ai to address real-world challenges. With a keen interest in solving practical problems, she brings a fresh perspective to the intersection of ai and real-life solutions.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>