Existing models of vision and language exhibit strong generalization across a variety of visual domains and tasks. However, these models primarily perform zero-shot recognition on a closed set and therefore have difficulty handling open-domain visual concepts by design. There are recent fitting methods, such as fast learning, that not only study discrimination between in-distribution (ID) and out-of-distribution (OOD) samples, but also show some improvements in the accuracy of both ID and OOD. . In this paper, we first show that vision-language models, after long enough fitting but without adequate regularization, tend to overfit to known classes in the given data set, with degraded performance on unknown classes. We then propose a novel OGEN approach to address this problem, with the main focus on improving the OOD generalization of fitted models. Specifically, a class conditional feature generator is introduced to synthesize OOD features using only the class name of any unknown class. These synthesized features will provide useful insights into unknowns and help regularize the decision boundary between ID and OOD data when jointly optimized. Equally important is our adaptive self-distillation mechanism to regularize our feature generation model during joint optimization, that is, adaptively transferring knowledge between model states to further avoid overfitting. The experiments validate that our method produces compelling gains in OOD generalization performance across different environments.