We study the problem of speech cancellation in stereo singing, a subtask of musical source separation, whose objective is to estimate an instrumental background from a stereo mix. We explore how to achieve performance similar to large, state-of-the-art source separation networks from a small, efficient model for real-time speech separation. Such a model is useful when memory and computation are limited and singing voice processing must be executed with limited lead time. In practice, this is achieved by adapting an existing mono model to drive the stereo input. Improvements in quality are obtained by adjusting model parameters and expanding the training set. Furthermore, we highlight the benefits that a stereo model brings by introducing a new metric that detects attenuation inconsistencies between channels. Our approach is evaluated using objective offline metrics and a large-scale MUSHRA test, confirming the effectiveness of our techniques in rigorous listening tests.