
Image by author
Large language models (LLMs) are going crazy right now. However, as an organization, if you don’t have the right resources, it can be challenging to ride the great wave of the language model. Training and implementing long language models can be difficult, and suddenly you feel left out. Open source LLMs, such as Meta’s LLaMA series, have made LLM resources available.
And to add to the open source collection is MosaicML Basicslatest addition to his series – MPT-7B.
MPT stands for MosaicML Pretrained Transformer. The MPT models are GPT-style decoder-only transformers that come with many enhancements:
- Performance-optimized layer implementations
- Increased training stability due to architecture changes
- No context length limitations
MPT-7B is a transformer model that has been trained from the ground up using 1T code and text tokens. Yes, 1 BILLION! It was trained on the MosaicML platform, with a time frame of 9.5 days with zero human intervention. MosaicML costs ~$200k.
It is open source so it is available for commercial use and the tool will be a game changer for how companies and organizations work with their predictive analytics and decision making process.
The main features of MPT-7B are:
- Licensed for commercial use
- Trained on a large amount of data (1T tokens)
- Can handle extremely long inputs
- Optimized for fast training and inference
- Highly efficient open source training code.
MPT-7B is the base model and has been shown to outperform other open source 7B – 20B models. The quality of MPT-7B matches LLaMA-7B. To assess the quality of MPT-7B, MosaicML Foundation gathered 11 open source benchmarks and evaluated them using the industry standard way.
Image by MosaicML Foundation
MosaicML Foundations is also releasing three additional enhanced models:
- MPT-7B-Instruct
- MPT-7B-Chat
- MPT-7B-StoryWriter-65k+
MPT-7B-Instruct
He MPT-7B-Instruct The model is for following brief instructions. With 26,834 dated May 14, MPT-7B-Instruct allows you to ask quick and brief questions and gives you an instant response. You have a question and just want a simple answer – use MPT-7B-Instruct.
Why is this so cool? Typically, LLMs are taught to continue generating text based on the input that was provided. However, some look for LLMs that treat their input as an instruction. Instruction fine tuning allows LLMs to perform instruction trace output.
MPT-7B-Chat
Yes, we have another chatbot. MPT-7B-Chat generate dialogue. For example, if you want the chatbot to generate a speech, giving it context will generate a text in a conversational way. Or maybe you want to write a tweet that paraphrases a paragraph from an article, it can generate the dialogue for you!
Why is this so cool? MPT-7B Chat is ready and well-equipped for a variety of conversational tasks, providing smoother and more engaging multi-turn interactions for users.
MPT-7B-StoryWriter-65k+
This is for the story writers! For those who want to write stories that have extensive context, MPT-7B-StoryWriter-65k+ It is a model designed exactly for that. The model was built by tuning the MPT-7B with a context length of 65k tokens, and you can extrapolate beyond 65k tokens. MosaicML Foundation has been able to generate 84k tokens on a single A100-80GB GPU node.
Why is this so cool? This is because most open source LLMs can only handle sequences with a few thousand tokens. But just using a single 8xA100-80GB node on the MosaicML platform, you can tune MPT-7B to handle context lengths up to 65k!
The MosaicML team built these models in just a few weeks. In just a few weeks they took care of the preparation, training, adjustment and deployment of the data.
The data was gathered from a variety of sources, all of which had 1 billion tokens available from each source. The number of effective tokens is still a billion in each source! The team used EleutherAI’s, GPT-NeoXand 20B tokenizerallowing them to train on a diverse mix of data, apply consistent space bounding, and more.
All MPT-7B models were trained in the MosaicML Platformusing Oracle Cloud A100-40GB and A100-80GB GPUs.
If you want to learn more about MPT-7B tools and costs, please read: Blog MPT-7B.
MosaicML platform can be considered as the best starting point for organizations, be they private, commercial or community related, to create custom LLMs. Having this open source resource available will allow organizations to feel more free to use these tools to improve today’s organizational challenges.
Clients can train LLMs on any computing provider or data source, while maintaining efficiency, privacy, and cost transparency.
What do you think you will use the MPT-7B for? Let us know in the comments below.
nisha aria He is a data scientist, freelance technical writer, and community manager at KDnuggets. She is particularly interested in providing Data Science career tips or tutorials and theory-based knowledge about Data Science. She also wants to explore the different ways that Artificial Intelligence is or can benefit the longevity of human life. An enthusiastic student looking to expand her technological knowledge and her writing skills as she helps mentor others.