Efficient parameter fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA), allow large pre-trained baseline models to adapt to subsequent tasks using a small percentage (0.1%-10%) of the original trainable weights. A less explored area of PEFT is to extend the pre-training phase without supervised labels, specifically, adapting base models to new domains through efficient self-supervised pre-training. While traditional pre-training of basic models in language and vision has been resource intensive, recent advances in PEFT techniques have enabled effective tuning with minimal computational cost based on the assumption that weight updates have intrinsic range. low.
Vision Core Models (VFM) such as DinoV2 and Masked Autoencoders (MAE) have shown excellent performance on tasks such as classification and semantic segmentation using self-supervised learning (SSL). Recently, domain-specific VFMs have emerged, such as SatMAE, which processes temporal or multispectral satellite images. The efficient adaptation of these large models has led to the adoption of PEFT methods, which update only a fraction of the parameters. Techniques like LoRA apply low-range weight updates, while others modify the number of trainable parameters. Domain adaptation strategies address distribution changes between training and testing data using mismatch metrics or adversarial training to improve model performance across domains.
Researchers at Stanford University and CZ Biohub have developed ExPLoRA, an innovative technique to enhance transfer learning for pre-trained vision transformers (ViTs) amid domain changes. By initializing a ViT with weights from large natural image datasets such as DinoV2 or MAE, ExPLoRA continues pre-training unsupervised in a new domain, selectively unfreezing 1 or 2 ViT blocks while employing LoRA to fine-tune the remaining layers. This method achieves state-of-the-art performance in satellite image classification, improving top accuracy by 8% while using only 6% to 10% of the parameters compared to previous fully pre-trained models, demonstrating a significant efficiency and effectiveness in the domain. adaptation.
MAE and DinoV2 are SSL methods for ViT. MAE uses a masked encoder-decoder structure that requires complete tuning for downstream tasks, which can be compute-intensive. In contrast, DinoV2 demonstrates strong zero-shot performance by employing a trainable student-teacher model architecture, allowing adaptation without full tuning. The ExPLoRA method is proposed to address tuning inefficiencies, combining pre-trained weights with low-rank adaptations and additional updates to adapt ViTs to new target domains efficiently. This approach reduces storage requirements while maintaining strong feature extraction and generalization capabilities.
The experimental results focus on satellite images, highlighting a case study with the fMoW-RGB dataset, which achieved a superior accuracy of 79.2%. The ablation study examines performance metrics in various configurations. The ExPLoRA models, initialized with MAE and DinoV2 weights, outperform traditional fully pretrained methods and use only 6% of the ViT encoder parameters. Additional evaluations of multispectral imagery and multiple satellite data sets demonstrate the effectiveness of ExPLoRA in closing domain gaps and achieving competitive performance. The results indicate significant improvements in accuracy, showing the potential of ExPLoRA for satellite image classification tasks.
In conclusion, ExPLoRA is an innovative pre-training strategy designed to adapt pre-trained ViT models for various visual domains, including medical and satellite images. ExPLoRA addresses the limitations of expensive pre-training from scratch by enabling efficient transfer of knowledge from existing models, achieving superior performance compared to domain-specific foundations. The method combines PEFT techniques such as LoRA with minimal unfreezing of the model layers, significantly improving transfer learning. The experiments reveal state-of-the-art results in satellite imagery, improving linear survey accuracy by up to 7.5% and using less than 10% of the parameters of previous approaches.
look at the Paper and Project. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram channel and LinkedIn Grabove. If you like our work, you will love our information sheet.. Don't forget to join our SubReddit over 50,000ml
(Next Event: Oct 17, 202) RetrieveX – The GenAI Data Recovery Conference (Promoted)
Sana Hassan, a consulting intern at Marktechpost and a dual degree student at IIT Madras, is passionate about applying technology and artificial intelligence to address real-world challenges. With a strong interest in solving practical problems, he brings a new perspective to the intersection of ai and real-life solutions.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>