Cerebras Releases 7 GPT-Based Big Language Models for Generative AI

Rising barriers to entry are hindering the potential of AI to revolutionize global trade. OpenAI’s GPT4 is the latest large language model to be revealed. However, the model architecture, training data, hardware, and hyperparameters are kept secret. Companies are increasingly building large models, with access to the resulting models restricted to APIs and locked data sets.

Researchers feel that having access to open, replicable, and royalty-free state-of-the-art models for research and commercial applications is crucial to making LLMs a freely available technology. To this end, scientists have developed a set of transformer models, called Cerebras-GPT, using state-of-the-art methods and publicly available data sets. The Chinchilla formula was used to train these models, making them the first publicly available GPT models under the Apache 2.0 license.

Cerebras Systems Inc., an AI chip maker, recently revealed that it has trained and released seven GPT-based large language models for generative AI. Cerebras has announced that it will provide the models and their associated weights and the training recipe under the Apache 2.0 open source license. What is notable about these new LLMs is that they are the first to be trained on the CS-2 systems of the Cerebras Andromeda AI supercluster, which are powered by the Cerebras WSE-2 chip and are optimized to run AI software. This means that they are pioneering LLMs that have been trained without GPU-based technologies.

Promoted Reading: Document Processing and Intelligent Character Recognition (ICR) Innovations Over the Last Decade

When it comes to large linguistic representations, there are two competing philosophies. Models like OpenAI’s GPT-4 and DeepMind’s Chinchilla, which have been trained on proprietary data, belong to the first category. Unfortunately, the source code of such models and the learned weights are kept secret. The second category contains open source models that need to be trained in a computationally optimal way, such as Meta’s OPT and Eleuther’s Pythia.

Cerebras-GPT was created as a plugin for Pythia; It shares the same public Pile dataset and aims to build a training efficient scaling law and model family over a wide range of model sizes. Each of the seven models that make up Cerebras-GPT is trained with 20 tokens per parameter and has a size of 111M, 256M, 590M, 1.3B, 2.7B, 6.7B or 13B. Cerebras-GPT minimizes loss per unit of computation across all model sizes by selecting the most appropriate training tokens.

To continue this line of research, Cerebras-GPT uses the publicly available Pile dataset to develop a scaling law. This scaling law provides a computationally fast method to train LLMs of arbitrary size using Pile. The researchers plan to promote the progress of large language models by publicizing the findings to provide a beneficial resource for the community.

Cerebras-GPT was tested on various language-based tasks, including sentence completion and question-and-answer sessions, to determine how well it performed. Even if the models are proficient in understanding natural language, that proficiency may not transfer to the specialized tasks in process. As shown in Figure 4, Cerebras-GPT maintains state-of-the-art training efficiency for the most frequent downstream tasks. Scaling for subsequent natural language tasks has not yet been reported in the literature, although previous scaling laws have shown an increase in pretraining loss.

Source: https://www.cerebras.net/blog/cerebras-gpt-a-family-of-open-compute-ficient-large-language-models/

GPT Brains was educated on 16 CS-2 systems using traditional data parallelism. Cerebras CS-2 devices have enough memory to run even the largest models on a single machine without splitting the model, making it viable. The researchers built the Cerebras Wafer-Scale Cluster to facilitate simple scaling specifically for CS-2. By using weight streaming, a HW/SW co-designed execution technique, model size and cluster size can be scaled independently without the need for model parallelism. Increasing the cluster size is as easy as editing a configuration file with this design.

The Andromeda cluster, a 16x Cerebras wafer scale cluster, was used to train all Cerebras-GPT models. The cluster made it possible to run all the tests quickly, eliminating the requirement for time-consuming steps like distributed systems engineering and parallel model tuning that are often required in GPU clusters. More importantly, it freed up academics to focus on ML design instead of distributed systems architecture. Cerebras AI Model Studio provides access to the Cerebras Wafer-Scale Cluster in the cloud because researchers consider the ability to easily train large models important to the community at large.

Because very few companies have the resources to train genuinely large-scale models in-house, the release is significant, according to Cerebras co-founder and chief software architect Sean Lie. Often requiring hundreds or thousands of GPUs, “the release of seven fully trained GPT models to the open source community illustrates exactly how efficient clustering Cerebras CS-2 systems can be,” he said.

A full suite of GPT models trained with state-of-the-art efficiency methods, the company claims, has never been made publicly available before. It was claimed that compared to other LLMs, they require less time to train, are cheaper and consume less energy.

The company said that Cerebras LLMs are suitable for academic and commercial applications due to their open source nature. They also have several advantages, such as their training weights that produce an extremely accurate pretrained model that can be tuned for different tasks with relatively little additional data, making it possible for anyone to build a robust, generative AI application with little in the way. . of programming knowledge.

Traditional LLM training in GPUs requires a complicated combination of pipeline, model, and data parallelism techniques; this version shows that a “simple, parallel data-only training approach” can be just as effective. Brains, on the other hand, demonstrates how this can be achieved with a simpler parallel data-only model that does not require any changes to the original code or model to scale to very large data sets.

Training next-generation language models is incredibly difficult, as it requires a huge amount of resources, including a large computing budget, complex distributed computing methods, and extensive knowledge of ML. Therefore, only some institutions develop internal LLMs (big language models). Even in the last few months, there has been a noticeable shift towards not opening results by those with the requisite resources and skills. Cerebras researchers are committed to promoting open access to state-of-the-art models. In light of this, the Cerebras-GPT family of models, consisting of seven models with between 111 million and 13 billion parameters, has now been released to the open source community. Chinchilla-trained models achieve maximum precision within a specified computational budget. Compared to publicly available models, Cerebras-GPT trains faster, costs less, and uses less energy overall.

review the brains Blog. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 17k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.

Dhanshree Shenwai is a Computer Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking domain with strong interest in AI applications. She is enthusiastic about exploring new technologies and advancements in today’s changing world, making everyone’s life easier.