Can small language models give high performance? Meet StableLM: an open source language model that can generate text and code providing high performance with proper training

Stability AI is an artificial intelligence startup best known for its Stable Diffusion imaging AI technology. Today it has introduced a new free and open source language model called StableLM. The model is offered in three different parameter sizes for the Alpha phase: three billion, seven billion, fifteen billion, and sixty-five billion. Under the CC BY-SA-4.0 license rules, developers can review, use, and modify the basic StableLM models for personal and commercial projects.

Offering a more open, scalable, and transparent alternative to proprietary AI, the innovative Stable Diffusion image model was released to the public in 2022 through the efforts of Stability AI. Stability AI has released the StableLM suite of models, furthering its mission to democratize core AI capabilities. StableLM models will power various applications with text and code generation capabilities. They show how small, efficient models can be trained to perform well.

The team’s previous open source work with EleutherAI, a non-profit research center, allowed them to lay the groundwork for the release of StableLM. The open source Pile dataset was used to train several popular language models, including GPT-J, GPT-NeoX, and the Pythia suite. Cerebras-GPT and Dolly-2 are just two examples of the many new open source language models that extend these older models.

🚀 JOIN the fastest ML subreddit community

The experimental dataset used to teach StableLM is based on The Pile, except it is three times as large at 1.5 trillion tokens. Despite only having 3 to 7 billion parameters (GPT-3 has 175 billion), StableLM achieves unexpectedly excellent performance in conversation and coding tasks thanks to the richness of this data set. Information about the data set will be made public at a later date.

They have released a collection of research models optimized for use in the classroom. These refined models will first use data from five recently released open source conversational agent datasets: Alpaca, GPT4All, Dolly, ShareGPT, and HH. Following Stanford’s Alpaca license, these refined models are available under a CC BY-NC-SA 4.0 Non-Commercial License for Academic Research.

StableLM outlines the team’s vision to develop open, accessible, and useful AI technology through the following capabilities:

Transparency: To confirm performance, establish interpretability approaches, identify hazards, and help create safeguards, researchers can “look under the hood.” Without revealing private information or relinquishing authority over AI capabilities, companies and government agencies can modify (or “tune”) these open source models to suit their needs.
Accessibility: The team builds for regular people to use their models on their devices. Instead of relying on the proprietary services of a few companies, developers can use these models to build applications that work with a broader range of publicly available hardware. The economic benefits of AI are thus shared among a large group of users and creators. The proposed models are open and granular, allowing researchers and academics to go beyond the limitations of closed models in terms of interpretability and security.
Support: These models are made to help customers, not replace them. Rather than seeking superhuman intellect, the team is focused on improving the AI’s ability to execute specific tasks in real-world contexts. They build resources that enable ordinary people and businesses to harness the potential of AI to foster innovation, increase production, and broaden economic horizons.

The team notes that the quality of responses a user receives can vary, and may contain unpleasant language or opinions, as is the case with any pre-trained long language model that lacks fine tuning and reinforcement learning. Scale, data augmentation, community feedback, and optimization are all factors that should lead to considerable improvement.

review the GitHub and AI Stability Blog. Don’t forget to join our 19k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]

🚀 Check out 100 AI tools at AI Tools Club

Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech at the Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a strong interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new advances in technology and its real life application.

🚀 JOIN the fastest ML subreddit community