Self-training (ST) or pseudo-labeling has attracted significant interest in the Automatic Speech Recognition (ASR) community recently due to its success in leveraging unlabeled data. Unlike previous semi-supervised learning approaches that relied on iteratively regenerating pseudo-labels (PLs) from a trained model and using them to train a new model, recent state-of-the-art methods perform “continuous training” where PLs are generated using a very recent version of the model being trained. However, these approaches still rely on starting the ST through an initial phase of supervised learning in which the model is trained on only labeled data. We believe this has the potential to overfit the labeled dataset in low-resource settings and that ST from the start of training should reduce overfitting. In this article we show how we can do this by dynamically controlling the evolution of LP during the training process in ASR. To the best of our knowledge, this is the first study to show the feasibility of generating LP from the start of training. We can achieve this using two techniques that avoid instabilities that lead to degenerate models that do not generalize. First, we control the evolution of PLs through a curriculum that uses online changes in PLs to control PL cache membership and improve generalizability. Second, we found that by sampling the transcripts from the predictive distribution, rather than using only the best transcript, we can further stabilize the training. With these techniques, our ST models match previous work without an external language model.