How to scale your EMA

*=Equal taxpayers

Preserving training dynamics across batch sizes is an important tool for practical machine learning, as it allows you to balance batch size and wall clock time. This compensation is usually enabled by a scaling rule; For example, in stochastic gradient descent, the learning rate must be scaled linearly with the batch size. Another important machine learning tool is the EMA model, a functional copy of a target model whose parameters move toward those of its target model based on an exponential moving average (EMA) at a rate parameterized by a momentum hyperparameter. This EMA model can improve the robustness and generalization of supervised learning, stabilize pseudo-labeling, and provide a learning signal for self-supervised learning (SSL). Previous work has not considered optimizing the EMA model when scaling, resulting in different training dynamics between batch sizes and lower model performance. In this work, we provide a scaling rule for optimization in the presence of an EMA model and demonstrate the validity of the rule on a variety of architectures, optimizers, and data modalities. We also show the validity of the rule where the EMA model contributes to the optimization of the target model, allowing us to train EMA-based SSL and pseudo-labeling methods on small and large batches. For SSL, we allow BYOL training up to a batch size of 24,576 without sacrificing performance, a 6x wall clock time reduction on idealized hardware configurations.

Tags: EMA scale

How to scale your EMA

Technical Terrence Team

Meta and IBM team up against dominant Big Tech players

Leave a Reply Cancel reply

Recommended.

2 delicious FTSE growth shares you'd buy and hold for 10 years

Douglas Lenat, Who Tried to Make A.I. More Human, Dies at 72

Wall Street indexes rise as PPI data keeps smaller rate cut in sight By Reuters

Mike Novogratz Says Bitcoin Could Return To $30,000 Next Month – Bitcoin News

FBI Seizes Bitcoin From Foreign Scammers Posing As US Law Enforcement Officials.

Categories

Important Links

How to scale your EMA

Related

Technical Terrence Team

Meta and IBM team up against dominant Big Tech players

Leave a Reply Cancel reply

Recommended.

2 delicious FTSE growth shares you'd buy and hold for 10 years

Douglas Lenat, Who Tried to Make A.I. More Human, Dies at 72

Wall Street indexes rise as PPI data keeps smaller rate cut in sight By Reuters

Mike Novogratz Says Bitcoin Could Return To $30,000 Next Month – Bitcoin News

FBI Seizes Bitcoin From Foreign Scammers Posing As US Law Enforcement Officials.

Categories

Important Links

Get daily news updates to your inbox!