ai Out has recently introduced its latest advancements in the Lite series models, Lite-Oute-1-300M and Lite-Oute-1-65M. These new models are designed to improve performance while maintaining efficiency, making them suitable for deployment in various devices.
Lite-Oute-1-300M: Improved performance
The Lite-Oute-1-300M model, based on the Mistral architecture, comprises approximately 300 million parameters. This model aims to improve upon the previous 150 million parameter version by increasing its size and training it on a more refined dataset. The main goal of the Lite-Oute-1-300M model is to deliver improved performance while maintaining efficiency for deployment on different devices.
With a larger size, the Lite-Oute-1-300M model provides better context retention and consistency. However, users should note that as a compact model, it still has limitations compared to larger language models. The model was trained with 30 billion tokens with a context length of 4096, ensuring robust language processing capabilities.
The Lite-Oute-1-300M model is available in several versions:
Benchmark performance
The Lite-Oute-1-300M model has been evaluated in several tasks, demonstrating its capabilities:
- ARC Challenge: 26.37 (5 shots), 26.02 (0 shots)
- Easy ARC: 51.43 (5 shots), 49.79 (0 shots)
- CommonsenseQA: 20.72 (5 shots), 20.31 (0 shots)
- HellaSWAG: 34.93 (5 shots), 34.50 (0 shots)
- MMLU: 25.87 (5 shots), 24.00 (0 shots)
- OpenBookQA: 31.40 (5 shots), 32.20 (0 shots)
- PIQA: 65.07 (5 shots), 65.40 (0 shots)
- Winogrande: 52.01 (5 strokes), 53.75 (0 strokes)
Use with HuggingFace Transformers
The Lite-Oute-1-300M model can be used with the HuggingFace Transformers library. Users can easily implement the model in their projects using Python code. The model supports generating responses with parameters such as temperature and repetition penalty to fine-tune the output.
Lite-Oute-1-65M: Exploring ultra-compact models
In addition to the 300M model, OuteAI has also released the Lite-Oute-1-65M model. This ultra-compact experimental model is based on the LLaMA architecture and comprises approximately 65 million parameters. The main goal of this model was to explore the lower bounds of model size while maintaining basic language understanding capabilities.
Due to its extremely small size, the Lite-Oute-1-65M demonstrates basic text-generating capabilities, but may struggle with instructions or keeping the topic coherent. Users should be aware of its significant limitations compared to larger models and expect inconsistent or potentially inaccurate responses.
The Lite-Oute-1-65M model is available in the following versions:
Training and hardware
The Lite-Oute-1-300M and Lite-Oute-1-65M models were trained on NVIDIA RTX 4090 hardware. The 300M model was trained on 30 billion tokens with a context length of 4096, while the 65M model was trained on 8 billion tokens with a context length of 2048.
Conclusion
In conclusion, OuteAI’s launch of the Lite-Oute-1-300M and Lite-Oute-1-65M models aims to improve performance while maintaining the efficiency needed for deployment on multiple devices by increasing the size and refining the dataset. These models balance size and capacity, making them suitable for multiple applications.
Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary engineer and entrepreneur, Asif is committed to harnessing the potential of ai for social good. His most recent initiative is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has over 2 million monthly views, illustrating its popularity among the public.