Supervised learning is a type of machine learning that involves training a model on a labeled data set. In this approach, the algorithm receives input data and the corresponding correct output values, or “labels”. On the other hand, unsupervised learning is a paradigm that aims to learn to generate meaningful and understandable representations solely from inputs. Unsupervised learning remains one of the most challenging tasks in modern machine learning and deep learning despite the recent success of self-supervised learning in particular, which is now widely used in many applications, including image and speech recognition. , natural language processing and recommender systems.
Due to multiple moving parts, unsupervised learning is complicated and lacks reproducibility, scalability, and explainability. Recent literature has developed three main branches: 1) spectral embeddings, 2) self-supervised learning, and 3) reconstruction-based methods. Each of these schemes, however, has its pitfalls.
Spectral embedding estimates geodetic distances between training samples to produce embeddings, but this is highly dependent on challenging distance estimation, which limits its use.
Alternative methods, such as self-supervised learning, use similar losses but generate positive pairs to avoid geodetic distance estimation. However, self-supervised learning is limited by unintelligibility, numerous inconsistent hyperparameters between architectures and data sets, and a lack of theoretical guarantees. Finally, reconstruction-based learning has limitations regarding stability and the need for careful tuning of loss functions to handle noisy data.
To overcome such challenges, recent research from Stanford and Meta AI developed an overly simple unsupervised learning strategy that aims to challenge the limitations of current methods.
The approach is called DIET (Datum IndEx as Target) and it implements the simple idea of predicting the index of each item in a dataset as a training label. In this way, the structure of the model closely resembles the supervised learning scheme, that is, a main encoder plus a linear classifier. Consequently, any progress made within the scope of supervised learning can be transferred as is to DIET. In summary, the three main benefits of DIET are: i) minimal code refactoring, ii) architecture independence, and iii) absence of additional hyperparameters. In particular, DIET does not require specific positive pairs or teacher-student architectures, and provides a training loss that reports test-time performance without adding hyperparameters in the classification loss.
The experimental results shown in the article demonstrate that DIET can rival current state-of-the-art methods in the CIFAR100 and TinyImageNet benchmarks, demonstrating non-trivial potential. Interesting insights include empirical evidence of not being influenced by batch size and achieving good performance on limited data sets, while both are weaknesses of current self-monitored learning.
However, the DIET still has some limitations that need to be addressed. More precisely, DIET is very sensitive to the strength of data augmentation, similar to self-supervised learning, and convergence is slower than self-supervised learning, but label smoothing helps.
Finally, the paper does not address the issue of scalability to large data sets and shows that DIET cannot match state-of-the-art methods without additional consideration and design.
review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 15k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Lorenzo Brigato is a Postdoctoral Researcher at the ARTORG center, a research institution affiliated with the University of Bern, and is currently involved in the application of AI to health and nutrition. He has a PhD. He graduated in Computer Science from the Sapienza University in Rome, Italy. His PhD thesis focused on image classification problems with poor data distributions across samples and labels.