Large language models (LLMs) are periodically updated to improve performance, usually through changes to data or architecture. Within the upgrade process, developers typically prioritize improving overall performance metrics, paying less attention to maintaining backward compatibility. Instance-level degradation (instance regression) of performance from one model version to the next can interfere with a user's mental model of the capabilities of a particular language model. Users having to adapt their mental model with each update can lead to dissatisfaction, especially when the new model has degraded compared to a previous version for a known use case (model update regression). We found that when pre-trained LLM base models are updated, optimized user-facing downstream task adapters experience negative changes: instances that were previously correct are now incorrectly predicted. We observe model updating regression between different model versions on a diverse set of tasks and models, even when subsequent task training procedures remain identical. We argue for the importance of maintaining model update compatibility during upgrades and present evaluation metrics designed specifically for generative tasks, while also being applicable to discriminative tasks. We propose a training strategy to minimize the extent of instance regression in model updates, which involves training a compatibility adapter that can improve task-tuned language models. We show that negative changes are reduced by up to 40%, for example, when updating Llama 1 to Llama 2 with our proposed method.