Learning with identical distributions of trains and trials has been extensively investigated in both practice and theory. However, much remains to be understood about statistical learning under distribution changes. This article focuses on a distribution change environment where train and test distributions can be related using transformation (data) map classes. We initiated a theoretical study for this framework, investigating learning scenarios where the target class of transformations is known or unknown. We establish learning rules and algorithmic reductions for Empirical Risk Minimization (ERM), accompanied by learning guarantees. We obtain upper bounds on the sample complexity in terms of the VC dimension of the class composing the predictors with transformations, which we show in many cases is not much larger than the VC dimension of the class of predictors. We highlight that the learning rules we derive offer a game-theoretic view on distribution change: a learner searching for predictors and an adversary searching for transformation maps to respectively minimize and maximize the worst-case loss. .