New Stanford AI research presents an alternative explanation for the seemingly sharp and unpredictable emergent abilities of big language models

Researchers have long explored the emergent features of complex systems, from physics to biology to mathematics. Nobel Prize-winning physicist PW Anderson’s comment “More is different” is a notable example. He argues that as the complexity of a system increases, new properties may manifest that cannot be predicted (easily or at all), even from a precise quantitative understanding of the system’s microscopic details. Due to discoveries showing large language models (LLMs) such as GPT, PaLM, and LaMDA, which can demonstrate what are known as “emergent abilities” in a variety of tasks, there has been a lot of interest in machine learning of late.

It was recently stated succinctly that “LLM emerging skills” refer to “skills that are not present in small-scale models but are present in large-scale models; therefore, they cannot be predicted simply by extrapolating performance improvements to smaller-scale models.” The GPT-3 family may have been the first to find such emergent abilities. Later work emphasized the discovery, writing that “performance is predictable at a general level, performance on a specific task can sometimes emerge quite unpredictably and abruptly at scale”; in fact, these emerging skills were so surprising and notable that it was argued that such “abrupt and specific ability scaling” should be considered one of the two main defining characteristics of LLMs. In addition, the phrases “tight left turns” and “forward capabilities” have been used.

These quotes identify the two characteristics that distinguish emerging skills in LLMs:

JOIN the fastest ML subreddit community

1. Sharpness, changing from absent to conspicuously present instantly

2. Unpredictability, transition to model sizes that seem unlikely. These newly discovered abilities have attracted a lot of interest, leading to questions like What determines which abilities will emerge? What determines when abilities will manifest? How can you ensure that desirable talents always emerge while accelerating the emergence of undesirable ones? The relevance of these issues to AI alignment and safety is highlighted by Emerging Abilities, which warn that larger models may one day, without warning, possess unintended dominance over dangerous abilities.

The Stanford researchers test the idea that LLMs contain more precisely emergent skills, abrupt and unanticipated changes in model outcomes as a function of model scale on particular tasks in this study. Our skepticism arises from the finding that emergent abilities appear to be limited to measures that scale discontinuously or nonlinearly the error rate per token of any model. For example, they show that in BIG-Bench tests > 92% of emerging talent fall under one of two metrics: multiple choice. If the choice with the highest probability is 0, degree def = 1; otherwise. If the output string perfectly matches the destination string, then Exact String Match def = 1; more, 0.

This raises the possibility of a different explanation for the emergence of emergent abilities in LLMs: changes that appear abrupt and unpredictable may have been brought about by the researcher’s choice of measurement. Although the error rate per token of the family of models changes smoothly, continuously, and predictably with increasing model scale, this raises the possibility of another explanation.

They specifically state that the researcher’s choice of a metric that non-linearly or discontinuously skews the error rates per token, the lack of test data to accurately estimate the performance of smaller models (making models smaller small seem totally unable to perform the task), and the evaluation of very few large-scale models cause emerging skills to be a mirage. They provide a simple mathematical model to express your alternative point of view and show how the evidence for emerging LLM skills statistically supports it.

They then tested their alternative theory in three complementary ways:

1. Using the InstructGPT/GPT-3 family of models, they make, test, and confirm three predictions based on their alternative hypotheses.

2. They perform a meta-analysis of previously published data and show that emergent abilities only occur for certain metrics and not for task model families (columns) in task metric model family triple space. In addition, they demonstrate that the alteration of the measurement of the results of the fixed models disappears the phenomenon of emergence.

3. They illustrate how identical metric choices can produce what appear to be emergent abilities by intentionally inducing emergent abilities in deep neural networks of various architectures in various vision tasks (which, to their knowledge, have never been tested).

review the Research work. Don’t forget to join our 20k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at asif@marktechpost.com

Check out 100 AI tools at AI Tools Club

Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.