No train, all gain: Improving frozen representations with self-supervised gradients.

A central challenge in advancing deep learning-based classification and retrieval tasks is achieving robust representations without the need for extensive retraining or labeled data. Many applications rely on extensive, pre-trained models that function as feature extractors; However, these pretrained embeddings often fail to encapsulate the specific details necessary for optimal performance in the absence of fine tuning. Recycling is often impractical in many areas constrained by limited computational resources or lack of labeled data, for example in medical diagnosis and remote sensing. Therefore, developing a method that can improve the performance of fixed representations without requiring retraining would be a great contribution to the field, as the models will be able to generalize well across many different tasks and domains.

Approaches such as k-nearest neighbor (kNN) algorithms, Vision Transformers (ViTs), and self-supervised learning (SSL) techniques such as SimCLR and DINO have made considerable progress in representation learning by leveraging unlabeled data via pretext objectives. However, these methods are highly constrained and limited by requirements that may require certain backbone architectures, intense tuning, or large amounts of labeled data to reduce generalization. Many SSL techniques ignore the gradient information that may potentially be present in frozen states, which could improve the adaptability of the learned representations to various downstream applications by directly feeding important task-specific signals into the embeddings.

Researchers from the University of Amsterdam and valeo.ai present an optimized and resource-efficient method called FUNGI (Features of Unsupervised GradIents), designed to improve frozen embeddings by incorporating gradient information from self-supervised learning objectives. The new method is designed to improve frozen embedding by using gradient information from self-supervised learning targets. The method is effectively adaptive, since it can be applied with any previously trained model without changing its parameters, making it flexible and computationally efficient. Using gradients based on various SSL targets such as DINO, SimCLR and KL divergence, FUNGI enrichment is carried out due to the fusion of complementary information from other approaches in multimodal learning. The self-supervised and reduced learner gradients are concatenated to form model embeddings for highly discriminative feature vectors used for kNN classification. This efficient synthesis reduces the limits of current feature extraction techniques and allows for vastly improved performance without the need for additional training.

The FUNGI framework operates in three main stages: gradient extraction, dimensionality reduction, and concatenation with embeddings. It first computes gradients using the final hidden layers of the Vision Transformer models from SSL losses to capture rich features that are relevant to the task. Those high-dimensional gradients are then reduced to match a target dimensionality with the help of a binary random projection. Finally, the reduced gradient is concatenated with the embeddings and then further compressed using the PCA application before being converted into computationally efficient and highly informative feature sets. By doing this, you effectively increase frozen embeddings to enable higher performance on kNN retrieval and classification tasks.

FUNGI improves substantially on multiple benchmarks, including visual, text, and audio datasets. In the kNN classification results, FUNGI shows a relative increase of 4.4% across all ViT models, with the largest increases reported in Flowers and CIFAR-100. In data-poor environments (5 shots), FUNGI achieves a 2.8% increase in accuracy, illustrating its effectiveness in data-poor environments. It also covers retrieval-based semantic segmentation tasks in Pascal VOC, where FUNGI improves baseline embeddings by up to 17% in segmentation accuracy. Experimental results show that the improvements provided by FUNGI are consistent across different data sets and models and very useful for high data efficiency and adaptability scenarios, thus becoming a powerful solution for applications with labeled data and limited computational resources.

In conclusion, FUNGI provides an efficient means to improve pre-trained model embeddings by ingesting unsupervised gradients from SSL targets. It improves representations of frozen models while preserving performance at higher levels of frozen levels compared to other classification and retrieval tasks without retraining. Adaptability, computational efficiency, and strong performance on low data characterize a significant development in the area of representation learning, where pretrained models can run efficiently in scenarios where retraining is not practicable. This contribution represents a key advance in the applicability of artificial intelligence to practical tasks characterized by labeled data and limited computational resources.

look at the Paper and GitHub page. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on <a target="_blank" href="https://twitter.com/Marktechpost”>twitter and join our Telegram channel and LinkedIn Grabove. If you like our work, you will love our information sheet.. Don't forget to join our SubReddit over 55,000ml.

(<a target="_blank" href="https://landing.deepset.ai/webinar-implementing-idp-with-genai-in-financial-services?utm_campaign=2411%20-%20webinar%20-%20credX%20-%20IDP%20with%20GenAI%20in%20Financial%20Services&utm_source=marktechpost&utm_medium=newsletter” target=”_blank” rel=”noreferrer noopener”>FREE WEBINAR on ai) <a target="_blank" href="https://landing.deepset.ai/webinar-implementing-idp-with-genai-in-financial-services?utm_campaign=2411%20-%20webinar%20-%20credX%20-%20IDP%20with%20GenAI%20in%20Financial%20Services&utm_source=marktechpost&utm_medium=newsletter” target=”_blank” rel=”noreferrer noopener”>Implementation of intelligent document processing with GenAI in financial services and real estate transactions

Aswin AK is a consulting intern at MarkTechPost. He is pursuing his dual degree from the Indian Institute of technology Kharagpur. He is passionate about data science and machine learning, and brings a strong academic background and practical experience solving real-life interdisciplinary challenges.

Upcoming LinkedIn live event, 'One Platform, Multimodal Possibilities', where Encord CEO Eric Landau and Director of Product Engineering Justin Sharps will talk about how they are reinventing the data development process to Help teams quickly build innovative multimodal ai models.