This paper was accepted into the Efficient Natural Speech and Language Processing (ENLSP) workshop at the NeurIPS 2024 Workshop.
While large language models (LLM) dominate the ai landscape, small-scale large language models (SLM) are gaining attention due to consumer demands for cost and efficiency. However, there is limited research on the training behavior and computational requirements of SLMs. In this study, we explore the computational bottlenecks of SLM training (up to 2B parameters) by examining the effects of various hyperparameters and settings, including GPU type, batch size, model size, communication protocol, the type of attention and the number of GPUs. We evaluate these factors on popular cloud services using metrics such as loss per dollar and tokens per second. Our findings aim to support broader adoption and optimization of language model training for low-resource ai research institutes.