The reproducibility and transparency of large language models are crucial to promote open research, ensure the reliability of results, and enable investigations into data and model biases as well as potential risks. To this end, we launched OpenELM, a next-generation open language model. OpenELM uses a layered scaling strategy to efficiently assign parameters within each layer of the transformer model, leading to higher accuracy. For example, with a parameter budget of approximately 1 billion parameters, OpenELM shows a 2.36% improvement in accuracy compared to OLMo and requires 2x fewer pre-training tokens.
Unlike previous practices that only provide model weights and inference code, and pre-train on private data sets, our version includes the complete framework for training and evaluating the language model on publicly available data sets, including logs training, multiple checkpoints and pre-training. Training settings. We also published code to convert models to the MLX library for inference and fitting on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research efforts.