Long Language Models (LLMs) have taken the tech industry by storm in recent years. These language models, trained on large amounts of data, can perform a variety of tasks, ranging from the fundamental, such as summarizing text and writing poetry, to the more challenging, such as generating AI art prompts and even predicting the structure of text. the protein. OpenAI’s ChatGPT is currently among the best and well-known examples of such LLMs. ChatGPT is a dialog-based AI chat interface using the generative pre-trained Transformer 3 that can chat with people, write code, answer questions, and even solve challenging math equations. Even other tech giants, like Google and Microsoft, have yet to leave any stone unturned by launching their language models like BARD and Bing.
It is a widely held belief among academics that adding more parameters improves performance when training LLMs with nearly a billion parameters. Recent research shows that for a given training computational budget, smaller models trained with more data, as opposed to larger models, produce the best performance. The inference budget is another crucial key parameter to obtain the desired degree of performance. Although it might be cheaper to train a large model to a certain level of performance, a smaller one trained longer will ultimately be cheaper in inference. In some cases, the ideal model is not the one that trains the fastest, but the one that makes inferences the fastest.
To make its mark in the competitive race for generative AI models, Facebook parent company Meta is introducing its line of AI language models under the name LLaMA. This work aims to develop various language models that perform optimally under different inference assumptions, inspiring the AI community to conduct research on creating more responsible language models. Previously, access to such language models was expensive and limited because they often required servers to run. But with LLaMA, Meta aims to solve exactly that for researchers. Trained only with publicly available data, the organization claims that LLaMA can outperform the largest AI models currently in use, including OpenAI’s older GPT-3 model. The company has done a brilliant job of showing the fact that it is possible to train state-of-the-art models without resorting to proprietary and inaccessible data sets.
Meta has open sourced LLaMA in the hope that the models will help democratize the access and study of LLMs as they can be run on a single GPU. This will allow researchers to understand LLMs more fully and reduce other known issues such as bias, toxicity, and the ability to spread misinformation. Another intriguing aspect of this collection of language models is that, unlike other language models such as ChatGPT and Bing, LLaMA is intended for research purposes only and is distributed under a “non-commercial license”. Access is currently available to a variety of academic researchers, governments, universities, and other academic institutions.
LLaMA can produce human-like dialogs from a text input prompt like other AI-powered chatbots. There are four different models available, with parameters ranging from 7 billion to 65 billion. Compared to OpenAI’s previous GPT-3 model, it is almost ten times smaller. Only publicly accessible data from various domains that had already been used to train other LLMs was used to train the basic model suite. This made it easy for the models to be open sourced. English CCNet, C4, GitHub, Wikipedia, Books, ArXiv and Stack Exchange are some of the data sources used to train LLaMA. The transformer design serves as the basis for LLaMA, and further advances have been made over the course of the past few years. Meta researchers trained large transformers on a large amount of textual data using a standard optimizer.
One trillion tokens were used in training the smallest model, LLaMA-7B. On the other hand, models with larger parameters like LLaMA-33B and LLaMA-65B have been trained on 1.4 trillion tokens. The researchers evaluated their core model series using a variety of benchmarks, including BoolQ, WinoGrande, OpenBookQA, NaturalQuestions, RealToxicityPrompts, WinoGender, and others. The researchers’ two most important findings are that the LLaMA-13B model, the second smallest version, outperforms the older GPT-3 model in most benchmarks, and the LLaMA-65B model is competitive with some of the best models currently available, including DeepMind’s Chinchilla-70B and Google’s PaLM-540B Models.
In a nutshell, Meta has launched a series of novel, next-generation AI LLMs called LLaMAs for researchers who hope to advance LLM research and improve its robustness. The researchers have found that fine-tuning these models in the instructions leads to positive results in regards to future work. The researchers will carry out further investigations in this regard. To improve performance, Meta is also looking to implement larger models that have been trained on more substantial corpora.
review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 14k+ ML SubReddit, discord channel, and electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Khushboo Gupta is a consulting intern at MarktechPost. He is currently pursuing his B.Tech at the Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing, and web development. She likes to learn more about the technical field by participating in various challenges.