UC Berkeley Researchers Propose RingAttention: A Memory-Efficient AI Approach to Reduce Transformer Memory Requirements

One type of deep learning model architecture is called Transformers in the context of many next-generation ai models. They have revolutionized the field of artificial intelligence, particularly in natural language processing and other machine learning tasks. It is based on a self-attention mechanism in which the model weighs the importance of different parts of the input sequence when making predictions. They consist of an encoder and a decoder to process the inputs.

However, expanding the length of the Transformers context requires a lot of work. It is due to inherited self-care. Self-attention has a memory cost quadratic in the length of the input sequence, making it difficult to scale to longer input sequences. UC Berkley researchers developed a method called attention ring address this based on a simple observation. They observed that when self-attention and feedforward network calculations are performed in blocks, sequences can be distributed across multiple devices and easily analyzed.

They distribute the outer loop of computing attention in blocks among hosts, with each device managing its respective input block. For the inner loop, they compute block-wise attention and forward operations specific to their designated input block for all devices. Its host devices form a conceptual ring and send a copy of its key-value blocks that are used for block computation to the next device in the ring. They also simultaneously receive key-value blocks from the previous one.

Block calculations take longer than block transfers. The team overlapped these processes, resulting in no additional overhead compared to standard transformers. By doing so, each device requires only memory proportional to the block size, regardless of the length of the original input sequence. This effectively eliminates memory limitations imposed by individual devices.

Their experiments show that Ring Attention can reduce Transformers’ memory requirements by allowing them to train sequences more than 500 times longer than previous state-of-the-art technologies with memory efficiency. This method also allows training sequences that exceed 100 million in length without making attentional approximations. Because Ring Attention eliminates memory limitations imposed by individual devices, nearly infinite context sizes can also be achieved. However, a large number of devices would be needed, since the length of the sequence is proportional to the number of devices.

The research only involves an evaluation of the effectiveness of the method without large-scale training models. Since the length of the scaling context depends on the number of devices, the efficiency of the model depends on the optimization; They have only worked on the low-level operations necessary for optimal computer performance. The researchers say that in the future they would like to work on both maximum sequence length and maximum computer performance. The possibility of near-infinite context presents many interesting opportunities, such as large video and audio language models, learning from extended feedback and trial and error, understanding and generating codebases, and adapting artificial intelligence models to understand scientific data. as gene sequences. .

Review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 31k+ ML SubReddit, Facebook community of more than 40,000 people, Discord Channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.

If you like our work, you’ll love our newsletter.

We are also on WhatsApp. Join our ai channel on Whatsapp.

Arshad is an intern at MarktechPost. He is currently pursuing his international career. Master’s degree in Physics from the Indian Institute of technology Kharagpur. Understanding things down to the fundamental level leads to new discoveries that lead to the advancement of technology. He is passionate about understanding nature fundamentally with the help of tools such as mathematical models, machine learning models, and artificial intelligence.

<!– ai CONTENT END 2 –>

Now watch ai research updates on our Youtube channel (Watch Now)

UC Berkeley Researchers Propose RingAttention: A Memory-Efficient AI Approach to Reduce Transformer Memory Requirements

Technical Terrence Team

Invesco forecasts a slight recession in early 2024 (SP500)

Leave a Reply Cancel reply

Recommended.

Is ChatGPT’s behavior changing over time? Researchers evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 in four diverse tasks

Ethereum completes Shanghai upgrade

A Conversation on Bitcoin Spot ETFs and Decentralized ETFs

Bitcoin represents 53% of Latin American wallets: report

The second season of Andor arrives on Disney Plus in April

Categories

Important Links

UC Berkeley Researchers Propose RingAttention: A Memory-Efficient AI Approach to Reduce Transformer Memory Requirements

Related

Technical Terrence Team

Invesco forecasts a slight recession in early 2024 (SP500)

Leave a Reply Cancel reply

Recommended.

Is ChatGPT’s behavior changing over time? Researchers evaluate the March 2023 and June 2023 versions of GPT-3.5 and GPT-4 in four diverse tasks

Ethereum completes Shanghai upgrade

A Conversation on Bitcoin Spot ETFs and Decentralized ETFs

Bitcoin represents 53% of Latin American wallets: report

The second season of Andor arrives on Disney Plus in April

Categories

Important Links

Get daily news updates to your inbox!