The significant computational demands of large language models (LLMs) have hampered their adoption in several sectors. This obstacle has diverted attention toward compression techniques designed to reduce model size and computational needs without significant performance tradeoffs. This pivot is crucial in natural language processing (NLP), facilitating applications from document classification to advanced conversational agents. A pressing concern in this transition is ensuring that compressed models maintain robustness to minority subgroups in data sets defined by specific labels and attributes.
Previous work has focused on knowledge distillation, pruning, quantification, and vocabulary transfer, which aims to preserve the essence of the original models in much smaller spaces. Similar efforts have been made to explore the effects of model compression on image classes or attributes, such as imbalanced classes and sensitive attributes. These approaches have shown promise in maintaining overall performance metrics; however, its impact on nuanced subgroup robustness metrics still needs to be explored.
A research team from the University of Sussex, the BCAM Severo Ochoa Strategic Laboratory on Trustworthy Machine Learning, Monash University and expert.ai have proposed comprehensive research into the effects of model compression on the robustness of subgroups of BERT language models. The study uses MultiNLI, CivilComments, and SCOTUS datasets to explore 18 different compression methods, including knowledge distillation, pruning, quantization, and vocabulary transfer.
The methodology used in this study involved training each compressed BERT model using Empirical Risk Minimization (ERM) with five different initializations. The goal was to measure the effectiveness of the models through metrics such as average accuracy, worst group accuracy (WGA), and overall model size. Different data sets required custom approaches for tuning, involving variable epochs, batch sizes, and learning rates specific to each. For methods involving vocabulary transfer, an initial masked language modeling phase was carried out before the fitting process, ensuring that the models were adequately prepared for the impact of compression.
The findings highlight significant variations in model performance between different compression techniques. For example, on the MultiNLI dataset, models such as TinyBERT6 outperformed the benchmark BERTBase model, showing an average accuracy of 85.26% with a notable worst group (WGA) accuracy of 72.74%. In contrast, when applied to the SCOTUS data set, a marked drop in performance was observed, with the WGA of some models collapsing to 0%, indicating a critical threshold of model ability to effectively manage the robustness of the data. subgroups.
To conclude, this research sheds light on the nuanced impacts of model compression techniques on the robustness of BERT models toward minority subgroups across multiple data sets. The analysis highlighted that compression methods can improve the performance of language models on minority subgroups, but this effectiveness may vary depending on the data set and the weight initialization after compression. Limitations of the study include focusing on English data sets and not considering combinations of compression methods.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on Twitter. Join our Telegram channel, Discord channeland LinkedIn Grabove.
If you like our work, you will love our Newsletter..
Don't forget to join our 39k+ ML SubReddit
Nikhil is an internal consultant at Marktechpost. He is pursuing an integrated double degree in Materials at the Indian Institute of technology Kharagpur. Nikhil is an ai/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in materials science, he is exploring new advances and creating opportunities to contribute.
<!– ai CONTENT END 2 –>