The remarkable results achieved by transformer-based models such as GPT-2 and GPT-3 made the research community gravitate towards exploring Long Language Models (LLMs). Furthermore, the recent success and popularity of ChatGPT has only served to increase people’s interest in LLMs. Learning in context and chain of thought cues are two other important discoveries that have significantly improved the accuracy of the models. These discoveries go beyond simple question answering, where an input flag containing a question is used to generate a reasonable answer.
Although these prompting tactics have been effective in improving performance, CT-based LLMs can only condition a fixed input string length, limiting the computations they can render. This can also be understood to mean that any deterministic language model that relies on finite length strings is computationally limited since the model is equivalent to a finite automaton. To counteract this, researchers have looked at the possibility of adding an external feedback loop to LLMs, where model outputs are supplied as inputs after post-processing. However, the question of whether this method substantially extends the computational set of a model is still open.
Google Brain and researchers from the University of Alberta worked together to work on this problem statement. They added external read-write memory to an LLM to verify that it could emulate any algorithm on any input. His research is summarized in the article “Memory Augmented Large Language Models are Computationally Universal”, which shows how an LLM enhanced with a read-write associative memory is computationally universal.
The Flan-U-PaLM 540B was the LLM chosen by the researchers. The idea behind the research is to use a simple stored instruction computer to link LLM and associative memory. This makes it possible for the outputs and input indications to be forwarded to the language model to interact in a loop. External associative memory can be thought of as a dictionary, in which the key-value pairs are variable names/locations of addresses and values. The language model and memory use regular expression matches to perform each parsing step.
A unique “fast program” is then developed to drive the system to simulate the execution of a universal Turing machine after establishing a stored-instruction computer. In the end, proving the reliability of the simulation comes down to examining a limited number of fast hit patterns and confirming that the language model generates the proper output for each finite set of possible input strings. The fact that this study does not involve any extra “training” of the language model or alteration of its pretrained weights is one of the main strengths of the work. Instead, the build relies solely on creating a type of stored-instruction computer that can then be programmed with certain cues.
In contrast to previous research in this field that explores the computational universality of models, this study is distinctive. The main contrast is that the researchers demonstrated how increasing external memory could elicit universal computational behavior using a fixed language model with fixed pretrained weights. The findings demonstrate that large language models are already computationally universal as they currently exist, provided they have access to infinite external memory.
review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our reddit page, discord channel, Y electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Khushboo Gupta is a consulting intern at MarktechPost. He is currently pursuing his B.Tech at the Indian Institute of Technology (IIT), Goa. She is passionate about the fields of machine learning, natural language processing, and web development. She likes to learn more about the technical field by participating in various challenges.