This article was accepted for presentation at the International Workshop on Federated Foundation Models (FL@FM-NeurIPS'24), held in conjunction with NeurIPS 2024.
Asynchronous protocols have been shown to improve the scalability of federated learning (FL) with a large number of clients. Meanwhile, momentum-based methods can achieve the best model quality in synchronous FL. However, naively applying boosting in asynchronous FL algorithms leads to slower convergence and degradation of model performance. It is still unclear how to effectively combine these two techniques to achieve mutual benefit. In this paper, we find that asynchrony introduces an implicit bias in momentum updates. To address this issue, we propose a momentum approximation that minimizes bias by finding an optimal weighted average of all historical model updates. The boosting approach supports secure aggregation and differential privacy, and can be easily integrated into production FL systems with lower communication and storage cost. We empirically demonstrate that on benchmark FL datasets, the boosting approximation can achieve a convergence speed of 1.15 to 4 times compared to existing asynchronous FL optimizers with boosting.