Always-on machine learning models require very low memory and compute consumption. Their restricted parameter count limits the model’s ability to learn and the effectiveness of typical training algorithms in finding the best parameters. Here we show that a small convolutional model can be better trained by first refactoring its computation into a larger redundant multi-branch architecture. Then, for inference, we algebraically parameterize the trained model in the form of a single branch with fewer parameters for lower memory consumption and compute cost. Using this technique, we show that our always-on wake word detector model, RepCNN, offers a good balance between latency and accuracy during inference. RepCNN-reparameterized models are 43% more accurate than a single-branch convolutional model while having the same runtime. RepCNN also matches the accuracy of complex architectures like BC-ResNet, while having 2x lower peak memory usage and 10x faster runtime.