Researchers at the University of Washington and Google have developed step-by-step distillation technology to train a dedicated small machine learning model with less data

In recent years, large language models (LLMs) have revolutionized the field of natural language processing, enabling unprecedented and long-shot learning capabilities. However, its implementation in real-world applications has been hampered by its immense computational demands. A single LLM of 175 billion parameters requires a staggering 350 GB of GPU memory and specialized infrastructure. Since current models have over 500 billion parameters, these requirements make LLMs inaccessible to many research teams, particularly those with low-latency performance needs.

To address this implementation challenge, researchers have turned to smaller, specialized models, trained through fine-tuning or distillation. Tuning, while effective, relies on expensive and time-consuming human-generated labels. Distillation, on the other hand, requires large amounts of unlabeled data, which can be difficult to obtain.

In a groundbreaking study by a Google and University of Washington research team presented at ACL2023, the authors presented “Distill step by step”, a novel mechanism designed to mitigate the trade-off between model size and data collection cost. This innovative approach depends on extracting informational natural language foundations, or intermediate reasoning steps, from LLMs. These foundations serve as additional and richer supervision in training smaller task-specific models alongside standard task labels.

The researchers describe a two-stage process to implement distillation step by step. First, they employ the CoT stimulus to extract fundamentals from an LLM, allowing the model to generate fundamentals for unseen inputs. These foundations are then integrated into the training of small models using a multi-task learning framework, with task prefixes guiding model differentiation between label prediction and foundation generation.

In a series of experiments, a 540B parameter LLM was used, along with T5 models for task-specific downstream tasks. Step-by-Step Distilling showed notable performance improvements with significantly reduced data requirements. For example, on the e-SNLI dataset, the method outperformed standard fine-tuning on only 12.5% of the entire dataset. Similar reductions in data set size were observed in several NLP tasks, including ANLI, CQA, and SVAMP.

Additionally, Step-by-Step Distilling achieved superior performance using considerably smaller model sizes compared to few-shot CoT-driven LLMs. For example, in the e-SNLI data set, a 220M T5 model outperformed a 540B PaLM. At ANLI, a 770M T5 model outperformed a 540B PaLM by more than 700 times, demonstrating the immense potential for efficiency gains.

In particular, Distilling Step-by-Step demonstrated its ability to outperform low-opportunity LLMs using significantly smaller models and less data. For example, at ANLI, a 770M T5 model outperformed a 540B PaLM using only 80% of the entire data set, a feat unachievable using standard fine tuning.

In conclusion, Distilling Step-by-Step presents an innovative paradigm for training small, task-specific models. By extracting the fundamentals from LLMs, this approach not only reduces the data required for model training, but also allows the use of significantly smaller models. This innovative technique will revolutionize the field of natural language processing, making advanced language models more accessible and practical for a broader range of applications.

Review the Paper and Google article on ai. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our SubReddit of more than 30,000 ml, Facebook community of more than 40,000 people, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.

If you like our work, you’ll love our newsletter.

Niharika is a Technical Consulting Intern at Marktechpost. She is a third-year student currently pursuing her B.tech degree at the Indian Institute of technology (IIT), Kharagpur. She is a very enthusiastic person with a keen interest in machine learning, data science and artificial intelligence and an avid reader of the latest developments in these fields.

<!– ai CONTENT END 2 –>

The end of human project management (Sponsored)

Researchers at the University of Washington and Google have developed step-by-step distillation technology to train a dedicated small machine learning model with less data

Technical Terrence Team

What is position trading? All the crucial information

Leave a Reply Cancel reply

Recommended.

The Volatility Of Nigerian Real Estate, And Why Bitcoin Makes More Sense

Projected language models: a large model presegmented into smaller models

Ethereum's new DEX Ethervista guzzles gas and bucks the trend of low network fees

Driving advanced analytics outcomes at scale using Amazon SageMaker powered PwC’s Machine Learning Ops Accelerator

The 'British Warren Buffett' has just bought 262,959 shares of this magnificent title

Categories

Important Links

Researchers at the University of Washington and Google have developed step-by-step distillation technology to train a dedicated small machine learning model with less data

Related

Technical Terrence Team

What is position trading? All the crucial information

Leave a Reply Cancel reply

Recommended.

The Volatility Of Nigerian Real Estate, And Why Bitcoin Makes More Sense

Projected language models: a large model presegmented into smaller models

Ethereum's new DEX Ethervista guzzles gas and bucks the trend of low network fees

Driving advanced analytics outcomes at scale using Amazon SageMaker powered PwC’s Machine Learning Ops Accelerator

The 'British Warren Buffett' has just bought 262,959 shares of this magnificent title

Categories

Important Links

Get daily news updates to your inbox!