This AI paper presents the application of a recursive memory to extend the length of the model context to an unprecedented two million tokens.

The Transformer concept has been widely adopted and applied in various fields of study and business. The most significant flaw in the model is the quadratic complexity of the attention operation, which makes large models more difficult to apply to longer inputs. This study demonstrates how a single Nvidia GTX 1080Ti GPU can process streams of more than 1 million tokens using a simple token-based memory scheme combined with pretrained transformer models such as BERT.

The first step in allowing recurrent memory (RMT) to be generalized to problems with unknown features, such as language modeling, is the study of synthetic tasks. Since this design gained popularity, a large number of studies have been done on the subject of long inputs in transformers. This study shows that significant amounts of memory are only sometimes needed when using Transformers to parse large text. A recursive strategy and memory can transform quadratic complexity into linear complexity. Furthermore, models trained on sufficiently large inputs can be generalized to readers with orders of magnitude longer. They plan to modify the recursive memory technique in later work to increase the effective context size of the most commonly used Transformers.

Figure 1: Information is stored in Transformer up to 2*106 tokens. They made it possible for a pre-trained BERT model to store task-specific data in 7 segments of 512 tokens each by adding recurrent memory to it (Bulatov et al., 2022). The largest input size for a transformer model recorded so far (64,000 tokens for CoLT5 and 32,000 tokens for GPT-4) was greatly exceeded by the model during inference, allowing it to use memory efficiently for up to 4,096 segments with a total length of 2,048,000 tokens In testing, this increase maintains the memory capacity of the base model at 3.6 GB.

Researchers at DeepPavlov, the Artificial Intelligence Research Institute and the London Institute of Mathematical Sciences make the following contributions

JOIN the fastest ML subreddit community

1. To enhance the existing system, token-based memory storage and segment-level recursion with Recursive Memory (RMT) are added to BERT.

2. They show that BERT with increased memory can be taught to handle jobs in sequences up to seven times longer than the expected input length of 512 tokens.

3. They found that the trained RMT can be extrapolated to tasks of various durations, including those that require a linear scale of computations and exceed 1 million tokens, effectively.

4. Using attention pattern analysis, they discovered the memory processes that RMT uses to successfully handle extraordinarily long sequences.

The authors present as a conclusion the use of a recursive memory in BERT, one of the most successful Transformer-based models in natural language processing. They have effectively extended the effective context length of the model to an unprecedented two million tokens, while retaining good memory retrieval accuracy using the Recurrent Memory Transformer architecture. His approach allows the flow of information through segments of the input stream through the use of recursion and allows the storage and processing of local and global information. Their tests show the effectiveness of their method, which has great potential to improve long-term dependency handling in tasks involving natural language creation and understanding, as well as to enable large-scale context processing for applications using memory intensive.

review the Paper. Don’t forget to join our 20k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at asif@marktechpost.com

Check out 100 AI tools at AI Tools Club

Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.

This AI paper presents the application of a recursive memory to extend the length of the model context to an unprecedented two million tokens.

Technical Terrence Team

Ethereum Exchange Inflows Trigger Drop Below $2,080

Leave a Reply Cancel reply

Recommended.

Stacks (STX) Sees 30% Gain as Mainnet Upgrade and Stablecoin Launch Near

Steam Deck OLED arrives November 16 with improved display and longer battery life

Apple's iPhone is not a monopoly like Windows was

Groupon activist boosts stake on 'growing conviction' as stock gains new highs By Investing.com

Pantera Capital Praises Solana as Ethereum Dominance Shifts Towards 'Multipolar Model'

Categories

Important Links

This AI paper presents the application of a recursive memory to extend the length of the model context to an unprecedented two million tokens.

Related

Technical Terrence Team

Ethereum Exchange Inflows Trigger Drop Below $2,080

Leave a Reply Cancel reply

Recommended.

Stacks (STX) Sees 30% Gain as Mainnet Upgrade and Stablecoin Launch Near

Steam Deck OLED arrives November 16 with improved display and longer battery life

Apple's iPhone is not a monopoly like Windows was

Groupon activist boosts stake on 'growing conviction' as stock gains new highs By Investing.com

Pantera Capital Praises Solana as Ethereum Dominance Shifts Towards 'Multipolar Model'

Categories

Important Links

Get daily news updates to your inbox!