Nvidia AI introduces the Normalized Transformer (nGPT): a hypersphere-based transformer that achieves 4-20x faster training and improved stability for LLMs

The rise of Transformer-based models has significantly advanced the field of natural language processing. However, training these models is often computationally intensive and requires substantial resources and time. This research addresses the issue of improving the training efficiency of Transformer models without compromising their performance. Specifically, it seeks to explore whether the benefits of normalization, often applied as a separate component, can be integrated throughout the Transformer architecture in a more coherent way.

NVIDIA researchers propose a novel architecture called Normalized Transformer (nGPT), which incorporates representation learning in the hypersphere. In this approach, all vectors involved in embeddings, MLP, attention matrices, and hidden states are normalized to the unitary norm. This normalization allows input tokens to move across the surface of a hypersphere, with each layer of the model contributing incrementally to the final output prediction. By conceptualizing the entire transformation process as movement in a hypersphere, researchers aim to make the formation process faster and more stable. The nGPT model reportedly reduces the number of training steps required by a factor of 4 to 20, depending on the length of the sequence.

The structure of the Normalized Transformer revolves around a systematic normalization process. All embeddings, as well as attention and MLP matrices, are forced to be located in a hypersphere, ensuring uniform representation across all layers of the network. Specifically, the inputs and outputs of the attention mechanism and MLP are normalized, treating each vector operation as a dot product representing cosine similarity. Additionally, instead of using traditional weight reduction and additional normalization layers such as LayerNorm or RMSNorm, the authors introduce learnable scaling parameters to control the impact of normalization. The normalization and optimization process in nGPT is designed as a variable-metric optimization in the hypersphere, with the update steps controlled by learnable self-learning rates that adaptively adjust the contributions of each layer.

The research results are convincing. The authors performed experiments using the OpenWebText dataset, training both a basic GPT model and the new nGPT model. For the same training budget, nGPT demonstrated a significant reduction in validation loss compared to GPT, particularly in longer duration contexts. For example, with a context length of 4k tokens, nGPT achieved the same validation loss as GPT with only one-tenth the iterations. The experiments also confirmed that nGPT consistently outperformed the baseline GPT on a variety of subsequent tasks, providing not only faster convergence but also improved generalization. The introduction of hyperspherical representation learning led to better embedding separability, which was correlated with higher accuracy in benchmark testing.

In conclusion, the Normalized Transformer (nGPT) presents a significant advance in the efficient training of large language models. By unifying the findings of previous studies on normalization and representation embedding, the authors created a model that is more efficient in terms of computational resources while maintaining high performance. The approach of using the hypersphere as the basis for all transformations allows for more stable and consistent training, potentially paving the way for future optimizations in the architecture of Transformer models. The researchers suggest that this method could be extended to more complex encoder-decoder architectures and other hybrid modeling frameworks.

look at the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram channel and LinkedIn Grabove. If you like our work, you will love our information sheet.. Don't forget to join our SubReddit over 50,000ml.

(Next live webinar: October 29, 2024) Learn how to increase inference performance by 4x and reduce service costs by 50% with Turbo LoRA, FP8, and GPU Autoscaling (promoted)

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. Their most recent endeavor is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has more than 2 million monthly visits, which illustrates its popularity among the public.

Listen to our latest ai podcasts and ai research videos here

Nvidia AI introduces the Normalized Transformer (nGPT): a hypersphere-based transformer that achieves 4-20x faster training and improved stability for LLMs

Technical Terrence Team

Macy's has a bold new strategy for customers ahead of the holiday season

Leave a Reply Cancel reply

Recommended.

Forget Walmart, Amazon has a surprising new rival

Web3 TCG Parallel raises $35 million to fund game expansion

Veteran Analyst Makes Surprising Decision on Rocket Lab Stock After Surge

AI design apps made my new apartment look weird

Bitcoin On-Chain Data Highlights Key Similarities Between 2019 and 2023 BTC Price Rally

Categories

Important Links

Nvidia AI introduces the Normalized Transformer (nGPT): a hypersphere-based transformer that achieves 4-20x faster training and improved stability for LLMs

Related

Technical Terrence Team

Macy's has a bold new strategy for customers ahead of the holiday season

Leave a Reply Cancel reply

Recommended.

Forget Walmart, Amazon has a surprising new rival

Web3 TCG Parallel raises $35 million to fund game expansion

Veteran Analyst Makes Surprising Decision on Rocket Lab Stock After Surge

AI design apps made my new apartment look weird

Bitcoin On-Chain Data Highlights Key Similarities Between 2019 and 2023 BTC Price Rally

Categories

Important Links

Get daily news updates to your inbox!