Beware the scalpel: how to improve gradient surgery with an EMA

In addition to minimizing a single training loss, many deep learning estimation sequences rely on an auxiliary objective to quantify and encourage desirable model properties (e.g., performance on another dataset, robustness, agreement with a prior). Although the simplest approach to incorporating an auxiliary loss is to sum it with the training loss as a regularizer, recent work has shown that performance can be improved by combining the gradients beyond a simple sum; this is known as gradient surgery. We pose the problem as a constrained minimization problem where the auxiliary objective is minimized over the set of minimizers of the training loss. To solve this two-level problem, we follow a parameter update direction that combines the training loss gradient and the orthogonal projection of the auxiliary gradient to the training gradient. In a setting where gradients come from mini-batches, we explain how, using a moving average of the training loss gradients, we can carefully maintain this critical orthogonality property. We show that our method, Bloop, can produce much better results in NLP and vision experiments than other gradient surgery methods without EMA.

Beware the scalpel: how to improve gradient surgery with an EMA

Technical Terrence Team

ADM, LG Chem cancel joint ventures as construction costs soar (NYSE:ADM)

Leave a Reply Cancel reply

Recommended.

Understanding Hit Rate, MRR, and MMR Metrics

Does school choice need a Lemon Law?

Trump could announce Bitcoin as a US strategic reserve asset in Nashville on July 27

Hugging Face introduces Cosmopedia to create large-scale synthetic data for pre-training

Should I buy Nike shares for my Stocks and Shares ISA before April 5?

Categories

Important Links

Beware the scalpel: how to improve gradient surgery with an EMA

Related

Technical Terrence Team

ADM, LG Chem cancel joint ventures as construction costs soar (NYSE:ADM)

Leave a Reply Cancel reply

Recommended.

Understanding Hit Rate, MRR, and MMR Metrics

Does school choice need a Lemon Law?

Trump could announce Bitcoin as a US strategic reserve asset in Nashville on July 27

Hugging Face introduces Cosmopedia to create large-scale synthetic data for pre-training

Should I buy Nike shares for my Stocks and Shares ISA before April 5?

Categories

Important Links

Get daily news updates to your inbox!