This article was accepted at the Efficient Speech and Natural Language Processing Workshop (ENLSP-III) at NeurIPS.
Large, pre-trained models are problematic to use in resource-constrained applications. Fortunately, task-aware structured pruning methods offer a solution. These approaches reduce model size by removing structural units such as layers and attention heads in a way that takes the final task into account. However, these pruning algorithms require more task-specific data than is typically available. We propose a framework that combines structured pruning with transfer learning to reduce the need for task-specific data. Our empirical results answer questions such as: How should the two tasks be coupled? Which parameters should be transferred? And, when during training should transfer learning be introduced? Leveraging these insights, we show that our framework results in pruned models with improved generalization over strong baselines.