Microsoft Research Introduces phi-1: A New Large Language Model Specialized in Python Coding with a Significantly Smaller Footprint Than Competitive Models

Since the discovery of the Transformer design, the art of training massive artificial neural networks has advanced enormously, but the science behind this achievement is still in its infancy. A sense of order finally emerged amid the overwhelming and bewildering variety of output around the same time Transformers was released, showing that performance increases predictably as the amount of computation or the size of the application increases. network, a phenomenon now known as scaling laws. These scaling rules served as a guide for further scaling research in deep learning, and the discovery of variations in these laws resulted in a sharp increase in performance.

In this paper, they investigate how data quality could be improved along a different axis. Higher quality data produces better results; for example, data cleansing is a crucial step in creating current data sets and can result in relatively smaller data sets or the ability to run the data through more iterations. Recent research on TinyStories, a high-quality data set artificially created to teach English to neural networks, has shown that the benefits of high-quality data go much further. By drastically altering the scaling laws, improved data quality may make it possible to match the performance of large-scale models with much more efficient models/training.

In this study, the Microsoft Research authors demonstrate that good-quality data can further improve the SOTA of large language models (LLMs) while significantly reducing the size of the data set and training computation. The environmental cost of LLMs can be greatly reduced with smaller models that require less training. They build Python-specific functions from your docstrings, using coding-trained LLMs. HumanEval, the evaluation standard suggested in the last paper, has been used frequently to benchmark LLM performance in code.

Unleash the power of Live Proxies: private and undetectable residential and mobile IPs.

They demonstrate the power of high-quality data to break existing scaling laws by training a 1.3B parameter model, which they call phi-1, for about eight passes on 7B tokens (just over 50B total tokens seen). followed by fine tuning at less than 200 million chips. Generally speaking, they are pre-trained with “textbook-quality” data, both synthetically generated (using GPT-3.5) and filtered from web sources, and fitted with “textbook-exercise-like” data. Despite being several orders of magnitude smaller than competitive models, both in terms of data set and model size (see Table 1), they achieve 50.6% step@1 accuracy in HumanEval and 55.5% step@1 accuracy in MBPP (mostly basic Python programs), which are one of the best self-reported numbers using just one LLM generation.

By training a model of 1.3B parameters, they call phi-1 for around eight runs on 7B tokens (just over 50B total tokens observed), followed by fine tuning on less than 200M tokens, they show the ability to high-quality data to challenge the established. scaling rules Typically pre-trained with “textbook quality” data that was artificially created (using GPT-3.5) and filtered from online sources, and fitted with “textbook exercise-like” data “. They achieve 50.6% passing accuracy on HumanEval and 55.5% passing accuracy on MBPP (mostly basic Python programs), which is one of the best self-reported numbers using just one generation of LLMs, a despite being several orders of magnitude smaller than competitive models.

review the Paper. Don’t forget to join our 25k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at asif@marktechpost.com

Check out 100 AI tools at AI Tools Club

Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.