The introduction of large language models (LLM) has brought a significant level of advancement in the field of artificial intelligence. Based on the concepts of Natural Language Processing (NLP), Natural Language Understanding (NLU), and Natural Language Generation (NLG), LLMs have taken over the world with their incredible capabilities. Well-known models, such as LLaMA and LLaMA2, have been very effective tools for understanding and producing natural language.
However, they have set restrictions, such as a maximum context size of 2048 tokens for LLaMA and 4096 tokens for LLaMA2, respectively. Because of this restriction, they have difficulty executing tasks that require digesting long documents or long queries. Training or refining LLM with longer sequences is one method to expand the context window, but this presents computational difficulties and can be prohibitively expensive in resources.
Low range adaptation (LoRA) is a simple method to expand the context window. LoRA uses low-rank matrices, which are computationally efficient and limit the number of trainable parameters, to alter linear projection layers into self-attention blocks. However, according to empirical studies, training long-context models with simple low-rank adaptation does not seem to be very effective. Due to the typical self-attention mechanism, it produces significant levels of confusion for extended context expansions and loses effectiveness as the context size increases.
To overcome the limitations, a team of researchers has introduced LongLoRA, an efficient tuning approach to expand the context sizes of large pre-trained language models without incurring excessive computational costs. LongLoRA has been developed to effectively augment the context window of pre-trained LLMs such as LLaMA2. It accelerates the process of broadening the LLM context in two important ways.
First, LongLoRA makes possible effective context extension during fine-tuning by using short-round attention (S2-Attn). While dense global attention is still required for LLMs to perform well during inference, the adjustment process can be carried out effectively and quickly using sparse local attention. Compared to fine-tuning with conventional attention techniques, S2-Attn enables context extension and generates significant computational savings as it can be easily integrated and is an optional part of inference because it only requires two lines of code to implement during the training.
Second, LongLoRA reconsiders the fine-tuning procedure with emphasis on effective context expansion techniques for parameters. The team found that LoRA works admirably for context extension, as long as the model has trainable embedding and normalization layers. This understanding is key to successfully extending context without substantially increasing the computing load.
With LLaMA2 models ranging in size from 7B/13B to 70B, LongLoRA has presented notable empirical results for a variety of tasks. On a single computer with 8 x A100 GPU, the method increases the context of these models from 4k tokens to 100k tokens for LLaMA2 7B or up to 32k tokens for LLaMA2 70B. It realizes this extended context while maintaining the structures of the original model, making it compatible with methods and tools already in use such as FlashAttention-2.
A dataset called LongQA has also been developed for supervised fine-tuning to assist in real-world use of LongLoRA. More than 3,000 question-answer pairs with extensive contexts can be found in this dataset. The availability of this data set expands the usefulness of LongLoRA for academics and practitioners seeking to expand the capabilities of LLMs.
Review the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our SubReddit of more than 30,000 ml, Facebook community of more than 40,000 people, Discord Channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
Tanya Malhotra is a final year student of University of Petroleum and Energy Studies, Dehradun, pursuing BTech in Computer Engineering with specialization in artificial intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with a burning interest in acquiring new skills, leading groups and managing work in an organized manner.
<!– ai CONTENT END 2 –>