Large language models (LLMs) have become extremely popular due to their outstanding capabilities in a variety of natural language tasks. Although they are growing at a rapid pace, the huge computational resources required to train these models are a major drawback. Consequently, there has been an increase in interest in creating more compact and effective LLMs, such as LLaMA, MPT and Falcon. These mid-sized models are intended to support diverse use cases by providing effective inference and tuning. However, training even the smallest LLMs with a billion parameters from scratch is prohibitively expensive for many organizations due to the significant computational resources required.
Researchers have previously shown how, like moderately sized large language models (LLMs) like LLaMA, smaller language models can be just as powerful. These models are believed to be a more efficient substitute for large LLMs, which require a lot of processing power to train. In a recent study, a team of researchers investigated the usefulness of structured pruning as a successful technique for reducing the size of larger pre-trained models to smaller LLMs. This method makes use of two essential strategies, which are the following.
- Directed structured pruning: is a technique that methodically removes layers, heads, intermediate and hidden dimensions from a larger language model to trim it to a target configuration. By performing this procedure from start to finish, the coherence and operation of the model is preserved. Optimize the model without sacrificing vital language understanding capabilities.
- Dynamic batch loading: This method modifies the composition of the training data within each batch according to changing loss levels in various domains. It ensures that the model focuses more on tasks or domains where it is not performing as well as it could by dynamically modifying the data samples used in each batch. You can effectively tune your performance this way, increasing overall efficiency.
Sheared-LLaMA-1.3B and Sheared-LLaMA-2.7B, two smaller LLMs created from pruning a LLaMA2-7B model, show how effective this suggested procedure is. This trimming procedure only consumes 50 billion tokens, or 5% of OpenLLaMA’s pre-training budget, from the training set. Despite these drawbacks, Sheared-LLaMA-1.3B and Sheared-LLaMA-2.7B perform better on a range of 11 typical downstream jobs than other well-known LLMs of comparable scales, such as Pythia, INCITE, and OpenLLaMA. These exercises address a variety of topics, including adjusting instruction for open generation, reading comprehension, common sense understanding, and world knowledge.
Additional training with more tokens can also yield even greater benefits depending on the performance trajectory of the pruned models. Although testing in the current study is limited to models with a maximum of 7 billion parameters, the LLM slicing technique is designed to possess high generalizability and can be extended to encompass large language models of any size in prospective research.
In summary, LLM pruning provides a comprehensive approach to LLM size reduction using dynamic batch loading and focused structured pruning. Building Sheared-LaMA models that perform better than models of equivalent size on a variety of downstream tasks is an effective demonstration of this. This method demonstrates how smaller but stronger, more effective and economical LLMs can be developed, and can be used for a wide range of model sizes.
Review the Paper, GitHuband Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 31k+ ML SubReddit, Facebook community of more than 40,000 people, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
We are also on WhatsApp. Join our ai channel on Whatsapp.
Tanya Malhotra is a final year student of University of Petroleum and Energy Studies, Dehradun, pursuing BTech in Computer Engineering with specialization in artificial intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with a burning interest in acquiring new skills, leading groups and managing work in an organized manner.
<!– ai CONTENT END 2 –>