Researchers at the University of Massachusetts Lowell Propose ReLoRA: A New AI Method Using Low-Range Updates for High-Range Training

Over the past decade, ever-increasing training on parameterized networks, or the “stack more layers” strategy, has become the norm in machine learning. As the threshold for a “large network” has risen from 100 million to hundreds of billions of parameters, most research groups have found that the computing expenses associated with training such networks are too high to justify. . Despite this, there is a lack of theoretical understanding of the need to train models that can have orders of magnitude more parameters than the training instances.

More computationally efficient scaling optima, increased model recovery, and the simple strategy of training smaller models longer have provided exciting new commitments as alternative approaches to scaling. However, they rarely democratize the training of these models and do not help to understand why overparameterized models are necessary.

Overparameterization is also not necessary for training, according to many recent studies. Empirical evidence supports the lottery ticket hypothesis, which states that, at some point in initialization (or early training), there are isolated subnets (winning tickets) that, when trained, achieve the performance of the entire network.

[Sponsored]

Build your personal brand with Taplio

The first all-in-one AI-powered tool to grow on LinkedIn. Create better LinkedIn content 10 times faster, schedule, analyze your stats, and engage. Try it free!

Recent research from the University of Massachusetts Lowell introduced ReLoRA to solve this problem by using the rank-sum property to train a high-rank network with a series of low-rank updates. Their findings show that ReLoRA is capable of high-rank update and offers results comparable to standard neural network training. ReLoRA uses a full range training hot start similar to the rewind lottery ticket hypothesis. With the addition of a mix-and-reset (reboot) approach, an irregular learning rate scheduler, and optimizer soft resets, the efficiency of ReLoRA is improved and closer to full-range training, especially in large networks. .

They test ReLoRA with 350 million parameter transformer language models. During testing, they focused on autoregressive language modeling because it has proven to be applicable in a wide range of neural network uses. The results showed that the effectiveness of ReLoRA grows with the size of the model, suggesting that it could be a good option for training networks with billions of parameters.

When it comes to training large language models and neural networks, researchers feel that the development of low-rank training approaches offers significant promise for increasing training efficiency. They believe that the community can learn more about how neural networks can be trained via gradient descent and their remarkable generalization abilities in the overparameterized domain of low-rank training, which has the potential to contribute significantly to the development of learning theories. deep.

review the Paper and github link. Don’t forget to join our 26k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at asif@marktechpost.com

Check out over 800 AI tools at AI Tools Club

Dhanshree Shenwai is a Computer Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking domain with strong interest in AI applications. She is enthusiastic about exploring new technologies and advancements in today’s changing world, making everyone’s life easier.

StoryBird.ai has just released some amazing features. Generate an illustrated story from an advertisement. Check it here. (Sponsored)

Researchers at the University of Massachusetts Lowell Propose ReLoRA: A New AI Method Using Low-Range Updates for High-Range Training

Technical Terrence Team

Meet RPDiff: A Diffusion Model for Reorganizing 6 Degrees of Freedom Objects in 3D Scenes

Leave a Reply Cancel reply

Recommended.

Bitcoin falls as crypto markets prepare for US CPI data and FOMC meeting on a 'frantic macro Wednesday'

Twitch CEO Says He’s Quitting

AI Stable Diffusion and Midjourney art tools are the subject of a copyright lawsuit

Ticketmaster Launches NFT-Controlled Ticketing Service For Avenged Sevenfold Shows – Bitcoin News

Nutrien slides after strong fourth-quarter miss, guides fiscal 2023 earnings below consensus (NYSE:NTR)

Categories

Important Links

Researchers at the University of Massachusetts Lowell Propose ReLoRA: A New AI Method Using Low-Range Updates for High-Range Training

Related

Technical Terrence Team

Meet RPDiff: A Diffusion Model for Reorganizing 6 Degrees of Freedom Objects in 3D Scenes

Leave a Reply Cancel reply

Recommended.

Bitcoin falls as crypto markets prepare for US CPI data and FOMC meeting on a 'frantic macro Wednesday'

Twitch CEO Says He’s Quitting

AI Stable Diffusion and Midjourney art tools are the subject of a copyright lawsuit

Ticketmaster Launches NFT-Controlled Ticketing Service For Avenged Sevenfold Shows – Bitcoin News

Nutrien slides after strong fourth-quarter miss, guides fiscal 2023 earnings below consensus (NYSE:NTR)

Categories

Important Links

Get daily news updates to your inbox!