Intelligent scaling: Accelerating pre-training of large language models with small model initialization

This paper was accepted into the Efficient Natural Speech and Language Processing (ENLSP) Workshop at NeurIPS 2024.

The pre-training phase of language models often starts with randomly initialized parameters. With current trends in scaling models, training its large number of parameters can be extremely slow and expensive. In contrast, small language models are less expensive to train, but often cannot reach the accuracy of large models. In this article, we explore an intriguing idea to connect these two different regimes: can we develop a method to initialize large language models using pre-trained smaller models? Will such initialization bring any benefit in terms of training time and final accuracy? In this article, we present HyperCloning, a method that can expand the parameters of a pre-trained language model to those of a larger model with higher hidden dimensions. Our method ensures that the larger model retains the functionality of the smaller model. As a result, the larger model already inherits the predictive power and accuracy of the smaller model before training begins. We show that training such an initialized model results in significant savings in terms of GPU hours required for pre-training large language models.

Intelligent scaling: Accelerating pre-training of large language models with small model initialization

Technical Terrence Team

USD/CHF in a strong uptrend; USD/JPY leaves the zone

Leave a Reply Cancel reply

Recommended.

The script for stocks and bonds changes as global debt hits a new record (NASDAQ:QQQ)

Explaining the turmoil at OpenAI

Google will prohibit personal loan applications from accessing user photos and contacts

Binance to remove Bitcoin NFT collections

Binance and Kuna Crypto Exchanges Suspend Ukrainian Hryvnia Card Transactions – Bitcoin News

Categories

Important Links

Intelligent scaling: Accelerating pre-training of large language models with small model initialization

Related

Technical Terrence Team

USD/CHF in a strong uptrend; USD/JPY leaves the zone

Leave a Reply Cancel reply

Recommended.

The script for stocks and bonds changes as global debt hits a new record (NASDAQ:QQQ)

Explaining the turmoil at OpenAI

Google will prohibit personal loan applications from accessing user photos and contacts

Binance to remove Bitcoin NFT collections

Binance and Kuna Crypto Exchanges Suspend Ukrainian Hryvnia Card Transactions – Bitcoin News

Categories

Important Links

Get daily news updates to your inbox!