Recent advances in artificial intelligence have enabled the development of Large Language Models (LLM) with a significantly large number of parameters, some of which reach into the billions (e.g. LLaMA-2 which comes in sizes of 7B, 13B and even parameters 70B). With such specifications, the model is capable of achieving very high performances on various tasks, making it a powerful tool for various ai applications. The disadvantage of this, however, is that the implementation of such models is expensive and devices such as phones do not have enough memory to accommodate them.
Various pruning techniques have emerged in the past to overcome this problem. However, many lead to significant performance degradation after pruning. Furthermore, these methods do not easily extend to structured pruning. For this reason, a team of researchers from Imperial College London, Qualcomm ai Research, QUVA Lab and the University of Amsterdam have presented LLM Surgeon, a framework for unstructured, semi-structured and structured LLM pruning that prunes the model in multiple steps, updating the weights and curvature estimates between each step. According to experiments conducted by the researchers, their framework allows pruning of LLM by up to 30% without any significant performance degradation, proving its effectiveness.
The framework uses weight magnitude and activations from forward passes and gradient information from backward passes to relate weight removal costs to the true end goal. Researchers have improved on previous work on weight pruning by using more precise approximations to the loss curvature and more weight correlations to update the remaining weights.
Pruning accuracy depends on accurately estimating the local curvature and simultaneously overcoming the memory cost associated with storing the exact curvature.
LLM Surgeon uses the KFAC approximation for this task, a popular method for curvature approximation, due to its memory efficiency. This method allows the framework to calculate the dynamic allocation of structures that can be removed. In addition, it also allows the updating of the remaining weights, accounting for the elimination.
The framework prunes multiple weights at once to reach the target model size while inflicting the lowest possible cost. Additionally, LLM Surgeon prunes in multiple steps to improve performance to scarcity. The researchers justified their approach by showing that pruning performance increased with more shots.
The researchers evaluated LLM Surgeon's performance on language modeling tasks in models such as OPT and LLaMA-2, using data from the wikitext-2 dataset. For structured compression, the framework allows reducing the model size by up to 30% without any significant loss. Furthermore, it performs better than all baselines, achieving the best performance for each target size. Also for semi-structured and unstructured compression, LLM Surgeon outperforms all baselines, demonstrating the best performance at all target sizes.
In conclusion, LLM Surgeon addresses the problem that LLMs pose with a significantly large number of parameters in terms of implementation. The results show that it can prune rows and columns of a variety of LLMs by 20-30% without significant performance loss. It also achieves state-of-the-art results in unstructured and semi-structured pruning of LLM, allowing for a simpler implementation process.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to join. our SubReddit of more than 35,000 ml, 41k+ Facebook community, Discord channel, LinkedIn Groupand Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you'll love our newsletter.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of artificial intelligence for social good. His most recent endeavor is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has more than 2 million monthly visits, which illustrates its popularity among the public.
<!– ai CONTENT END 2 –>