The rapid growth of large language models (LLMs) has catalyzed the development of numerous NLP applications, such as chatbots, writing assistants, and programming aids. However, these applications often require unlimited input length and robust memory capabilities, which current LLMs lack. Expanding the length of pre-training text is impractical, requiring research to allow LLMs to handle infinite input lengths while preserving memory. Recent studies focus on improving the input context duration of LLMs, mainly by optimizing attention mechanisms. Techniques such as sliding window attention and StreamLLM aim to extend the input length, but suffer from attention loss and memory leak problems, leading to the exploration of filtering out less important tokens to maintain longer memory spans. .
Many studies have focused on extending the input context duration of LLMs by refining the attention mechanism. Some methods, such as sliding window attention, which limits each token to only serve recent tokens, ensure stable decoding speed. Other methods, such as fixed Sparse Transformer and LogSparse self-attention, have been proposed to preserve local context information and improve global attention. StreamLLM was introduced to achieve true infinite input length by keeping the focus on both initial and recent tokens. However, existing approaches face challenges such as token preservation and forgetting issues.
Researchers from Shanghai Jiao Tong University and Wuhan University present Infinite retentive LLM transmission (Mr.LLM), a model that allows LLMs to maintain extended memory across dialogues of infinite length without the need for adjustments. SirLLM uses the token entropy metric and memory decay mechanism to filter out key phrases, improving the adaptive and long-lasting memory of LLMs. Three tasks and data sets were designed to comprehensively evaluate the effectiveness of SirLLM: DailyDialog, grocery shopping and rock-paper-scissors.
The entropy values for each token are used to improve the memory capacity of the model by selectively preserving the key-value states of only the key tokens, which led to the proposal of SirLLM. The SirLLM framework overview involves maintaining a key-value (KV) cache and a token entropy cache. When the number of tokens stored in the KV cache exceeds the pre-training length L, SirLLM calculates the entropy of each token and selects the tokens with higher entropy, thereby conserving space in the KV cache. This is achieved by selecting the top k tokens with the highest token entropy. Higher token entropy implies a lower probability of word generation, indicating key tokens with more information. SirLLM also adjusts token positions within the cache for relative distances, focusing on cache positions rather than
positions of the original text. However, preserving tokens based solely on entropy can lead to rigid memory within the model, making adaptability difficult. To overcome this, a decay ratio ηdecay less than 1 is proposed, allowing the model to forget older key information after each round of dialogue, thus improving flexibility and user experience.

Analysis of the Rock, Paper, Scissors data set demonstrates the consistent outperformance of SirLLM compared to the benchmark StreamLLM among players with diverse casting preferences. SirLLM shows a consistent improvement in win rates against players of various preferences, maintaining this consistently high performance across all models tested. The drop mechanism built into SirLLM contributes significantly to maintaining balanced performance over multiple rounds, as evidenced by consistently high win rates. This feature is particularly advantageous in scenarios involving prolonged interactions such as extended games of Rock, Paper, Scissors, highlighting SirLLM's ability to adapt and remember previous moves, essential for success.

By introducing SirLLM, this study addresses the critical challenges of managing infinite input lengths and memory capacity. SirLLM achieves long dialogue retention without the need for model tuning by selectively reinforcing focus on critical information. Across three custom tasks: DailyDialog, Grocery Shopping, and Rock-Paper-Scissors, SirLLM consistently demonstrates stable improvement over existing models, regardless of dialogue complexity or length. The experimental results validate the robustness and versatility of SirLLM, positioning it as a valuable asset for future explorations and applications in natural language processing.
Review the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter. Join our Telegram channel, Discord Channeland LinkedIn Grabove.
If you like our work, you will love our Newsletter..
Don't forget to join our 42k+ ML SubReddit
Asjad is an internal consultant at Marktechpost. He is pursuing B.tech in Mechanical Engineering at Indian Institute of technology, Kharagpur. Asjad is a machine learning and deep learning enthusiast who is always researching applications of machine learning in healthcare.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>