In addition to minimizing a single training loss, many deep learning estimation sequences rely on an auxiliary objective to quantify and encourage desirable model properties (e.g., performance on another dataset, robustness, agreement with a prior). Although the simplest approach to incorporating an auxiliary loss is to sum it with the training loss as a regularizer, recent work has shown that performance can be improved by combining the gradients beyond a simple sum; this is known as gradient surgery. We pose the problem as a constrained minimization problem where the auxiliary objective is minimized over the set of minimizers of the training loss. To solve this two-level problem, we follow a parameter update direction that combines the training loss gradient and the orthogonal projection of the auxiliary gradient to the training gradient. In a setting where gradients come from mini-batches, we explain how, using a moving average of the training loss gradients, we can carefully maintain this critical orthogonality property. We show that our method, Bloop, can produce much better results in NLP and vision experiments than other gradient surgery methods without EMA.