Out-of-distribution (OOD) detection in deep learning models, particularly in image classification, addresses the challenge of identifying inputs unrelated to the model training task. Its goal is to prevent the model from making safe but incorrect predictions about the inputs (OOD) while accurately classifying the inputs into distribution (ID). By distinguishing between ID and OOD inputs, OOD detection methods improve model robustness and reliability in real-world applications.
A weakness in current OOD detection evaluations in image classification, specifically with regard to ImageNet-1K (IN-1K) related data sets, is the presence of ID objects within the OOD data sets. This problem leads to the incorrect classification of identification objects as OOD by state-of-the-art OOD detectors. Consequently, the evaluation of OOD detection methods suffers, resulting in the underestimation of actual OOD detection performance and the unfair penalization of the most effective OOD detectors.
A new article was recently published in which the authors intend to address the limitations in the evaluation of OOD detection methods. They present a new test dataset, NINCO, containing OOD samples without any objects from the ImageNet-1K (ID) classes. They also provide synthetic “OOD unit tests” to test weaknesses in OOD detectors. The paper evaluates various architectures and methods in NINCO, providing insights into model weaknesses and the impact of pretraining on OOD detection performance. The aim is to improve the evaluation and understanding of OOD detection methods.
The authors propose the creation of a new dataset called NINCO (No ImageNet Class Objects) to address limitations in evaluating OOD detection methods. They carefully select base classes from existing or newly extracted data sets, taking into account their non-permissive interpretation to ensure that they are not categorically part of the ImageNet-1K (ID) classes. Authors visually inspect each image in the base classes to remove swatches that contain ID objects or where no object of the OOD class is visible. This manual cleaning process ensures a higher quality data set.
NINCO consists of 64 OOD classes with a total of 5879 samples obtained from various datasets including SPECIES, LOCATIONS, FOOD-101, CALTECH-101, MYNURSINGHOME, ImageNet-21k, and freshly pulled from iNaturalist.org and other websites. In addition, the authors provide clean versions of 2715 OOD images from eleven test OOD datasets to assess potential identification contaminations.
The authors also propose the use of OOD unit tests, synthetically generated simple image inputs designed to assess OOD detection weaknesses. They suggest evaluating the performance of an OOD detector on these unit tests separately and counting the number of failed tests (FPR above a user-defined threshold) together with the overall evaluation on a test OOD dataset such as NINCO. These unit tests provide valuable insight into specific weaknesses that detectors may encounter in practice. Overall, the authors propose NINCO as a high-quality data set to evaluate OOD detection methods and suggest using OOD unit tests to gain additional information about a detector’s weaknesses.
The paper presents detailed evaluations of OOD detection methods on the NINCO dataset and unit tests. The authors analyze the performance of various OOD detection architectures and methods, revealing insights into model weaknesses and the impact of pretraining on OOD detection performance. Evaluating the NINCO dataset, the study evaluates different IN-1K models obtained from the timm library and advanced OOD detection methods. Role-based techniques such as Maha, RMaha, and ViM work better than the MSP baseline. Max-Logit and Energy also show notable improvements compared to MSP. Performance results differ depending on the model chosen and the OOD detection method. Pre-training proves to be influential as it contributes to improved performance of identification and generation of upper function embeds for OOD detection.
In conclusion, the study addresses the limitations in the evaluation of OOD detection methods in image classification. It presents the NINCO dataset, which contains OOD samples without any objects from the ImageNet-1K (ID) classes, and proposes the use of OOD unit tests to assess detector weaknesses. NINCO evaluations demonstrate the performance of different OOD detection models and methods, highlighting the efficacy of role-based techniques and the impact of pre-training. NINCO improves the evaluation and understanding of OOD detection methods by offering a clean data set and insight into detector weaknesses. The findings emphasize the importance of improving OOD screening assessments and understanding the strengths and limitations of current methods.
review the Paper and Github. Don’t forget to join our 23k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out 100 AI tools at AI Tools Club
Mahmoud is a PhD researcher in machine learning. He also has a
bachelor’s degree in physical sciences and master’s degree in
telecommunication systems and networks. Your current areas of
the research concerns computer vision, stock market prediction and
learning. He produced several scientific articles on the relationship with the person.
identification and study of the robustness and stability of depths
networks