*= Equal taxpayers
Federated learning (FL) is a technique for training models using data distributed across devices. Differential Privacy (DP) provides a formal privacy guarantee for sensitive data. Our goal is to train a large neural network language model (NNLM) on compute-constrained devices while preserving privacy using FL and DP. However, the DP noise introduced into the model increases as the model size grows, often preventing convergence. We propose partial integration updates (PEU), a novel technique to decrease noise by decreasing the payload size. Additionally, we adopt low-range adaptation (LoRA) and contrastive noise estimation (NCE) to reduce the memory demands of large models on compute-constrained devices. This combination of techniques makes it possible to train language models with a large vocabulary while preserving accuracy and privacy.