This paper has been accepted for the Foundation Models in the Wild workshop at ICML 2024.
Large language models are versatile tools but are not suitable for small inference budgets. Small models have more efficient inference but their lower capacity means that their performance can be good only if one limits their scope to a specialized domain. This paper explores how to obtain a small language model with good specialized accuracy, even when the specialization data is unknown during pretraining. We propose a novel architecture, projected networks (PNs). PNs are high-capacity networks whose parameters can be linearly projected onto a small network for fine-tuning. We evaluate the empirical effectiveness of our solution compared to small model training, distillation, and strict expert mixing.