After the great success of MosaicML-7B, MosaicML has once again surpassed the benchmark they previously set. In the new innovative release, MosaicML has released MosaicML-30B.
MosaicML is a very accurate and powerful pre-trained transformer. MosaicML claims that MosaicML-30B is even better than ChatGPT3.
Before the MosaicML-30B was released, MosaicML-7B had taken the AI world by storm. MPT-7B Base-instruct, base-chat and story writing were big hits. The company has claimed that these models were downloaded more than 3 million times worldwide. One of the biggest reasons for pushing an even better engine, which Mosaic ML has done with the MPT-30B, was the community’s craze for the models they released earlier.
It was amazing how the community adapted and used these MPT engines to create something better tuned and catered to specific use cases. Some of the interesting cases are LLaVA-MPT. LLaVa-MPT adds vision understanding to the pretrained MPT-7B.
Similarly, GGML optimizes MPT engines to work better on Apple Silicon and CPUs. GPT4ALL is another use case that allows you to run a chat option similar to GPT4 with MPT as the base engine.
When we look closely, one of the main reasons MosaicML is so better and seemingly has an edge while offering stiff competition and a better alternative to the larger companies is the list of competitive features they offer and the adaptability of their models to different use cases. with comparatively easy integration.
In this release, Mosaic ML also claimed that its MPT-30B outperforms existing ChatGPT3 with about a third of the parameters that ChatGPT uses, making it an extremely lightweight model compared to existing generative solutions.
It’s better than MosaicML’s existing MPT-7B, and this MPT-30B is available for commercial use under a commercial license.
Not only that, but MPT-30B comes with two pre-trained models, which are MPT-30B-Instruct and MPT-30B-Chat, which are capable of being influenced by a single instruction and are quite capable of following a multi-turn conversation. for longer. Time duration.
The reasons to make it better continue. MosaicML has designed the MPT-30B to be a better, more robust model with a bottom-up approach, ensuring every moving part works better and more efficiently. MPT-30B has been trained with an 8k token context window. Supports longer contexts via ALiBi.
You have improved your training and inference performance with the help of FlashAttention. MPT-30B is also equipped with stronger coding capabilities, which is attributed to the diversity of data they have made. This model was scaled up to an 8K context window on Nvidia’s H100. The company claims that this, to the best of its knowledge, is the first H100-trained LLM model, which are available to clients.
MosaicML has also kept the model lightweight, which helps start-up organizations keep operating costs low.
The size of the MPT-30B was also specifically chosen to facilitate deployment on a single GPU. 1xA100-80GB with 16-bit precision or 1xA100-40GB with 8-bit precision can run the system. Other comparable LLMs, such as the Falcon-40B, have a higher number of parameters and cannot be served on a single data center GPU (currently); this requires more than 2 GPUs, which increases the minimum cost of the inference system.
review the Reference article and HuggingFace repository link. Don’t forget to join our 25k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out 100 AI tools at AI Tools Club
Anant is a Computer Science Engineer currently working as a Data Scientist with a background in Finance and AI-as-a-Service products. He is interested in creating AI-powered solutions that create better data points and solve everyday problems in powerful and efficient ways.