What distinguishes robust from non-robust models? While such differences in the robustness of ImageNet distribution changes have been shown to be primarily due to differences in the training data, it is so far not known what that translates to in terms of what the model has learned. In this work, we bridge this gap by probing the representation spaces of 16 robust zero-shot CLIP vision encoders with various backbone networks (ResNets and ViTs) and pre-training sets (OpenAI, LAION-400M, LAION-2B, YFCC15M, CC12M and DataComp). , and compare them with the representation spaces of less robust models with identical bases, but different (pre)training sets or objectives (CLIP pretraining on ImageNet-Captions and supervised training or tuning on ImageNet). Through this analysis, we generate three novel insights. First, we detected the presence of outlier features in robust zero-shot CLIP vision encoders, which, to our knowledge, is the first time they have been observed in languageless and transformerless models. Second, we find that the existence of outlier features is an indication of the robustness of the ImageNet change in the models, since in our analysis we only found them in robust models. Finally, we also investigated the number of unique encoded concepts in the representation space and found zero-shot CLIP models to encode a larger number of unique concepts in their representation space. However, we do not consider this to be an indicator of the robustness of ImageNet's change and hypothesize that it is rather related to language monitoring. Since the presence of outlier features can be detected without access to any data from shifted datasets, we believe they could be a useful tool for practitioners to get an idea of the robustness of a pre-trained model's distribution change during implementation. .