Language models are one of the best advances in Artificial Intelligence. With capabilities like summarizing articles, writing stories, answering questions, and completing code, language models are here to stay. These models are ubiquitous and are trained on massive amounts of textual data, including books, social media posts, articles, etc. The latest development of OpenAI, GPT-3, already has millions of users and 175 billion parameters. Pretrained Generative Transformer 3 has human-like conversations and produces text on various topics and topics. People even use them to create interactive chatbots and virtual assistants.
A language model works with the help of several computational layers, including the input layer, the embedding layer, the hidden layers, and the output layer. Since machines do not understand text and only understand numeric data, the function of the first layer is to convert the text entered as input to the model into a numeric representation. Following this, different layers operate on the numerical data performing various calculations and estimates. Intermediate text interpretation is performed at each level and weights are adjusted to improve model performance.
The weights in a model represent the strength of the networks between neurons, which determines the performance of the model and the correctness of the output. Many weights that are closer to the model input remain the same at training time, leading to redundancy in model training. This causes a decrease in efficiency and loss of energy, resources and time. A new approach called Embedding Recycling (ER) has been introduced, which can improve efficiency and reuse sequence representations from previous model runs.
Embedding Recycling preserves sequence representations during training and saves time and resources when multiple language models run on the same corpus of textual data. Several models run and operate on the same textual corpus. Reusing the contextualized embeds generated in the previous model run is important to lower the cost and speed up the training process. The research team, made up of researchers from AI2, Yale and Northwestern, has tested this technique for 14 different tasks and eight language models. The number of parameters in these models ranged from 17 million to 900 million parameters. It showed an increase in training speed by 90% and 87 to 91% acceleration in inference. All of this has been achieved with only minimal loss in the F-1 metric.
The team has shared some examples where Integrated Recycling can be used, ie where multiple models are running on the same corpus. These include topic classification, text summarization and keyword extraction in the same Wikipedia document and an artificial intelligence business assistant that performs emotion recognition, command identification, etc., in the same Wikipedia query. user.
Embedding Recycling is undoubtedly a great method to reduce the computational costs of training and inference. It introduces layer recycling with the help of fine-tuning adapters and parameter efficiency, which seems favorable for efficient use of language models. Consequently, Embedding Reying is a breakthrough in the development of language models.
review the Paper, Github and Reference article. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 14k+ ML SubReddit, discord channel, and electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
For advertising or sponsorship, Please fill out this form.
Tanya Malhotra is a final year student at the University of Petroleum and Power Studies, Dehradun, studying BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.