Almost all objectives described in natural language can be optimized by consulting a language model. However, a program can often provide results with higher objective values by making several organized calls to a language model. These are referred to as “scaffold” programs and are often created (by people) using a computer language such as Python. Their main finding is that the design of a scaffolding program is a matter of optimization for any distribution on optimization problems and any given language model. Researchers from Microsoft Research and Stanford University describe in this article the Self-Taught Optimizer (STOP), a technique in which the recursive application of code that uses a language model to improve any given solution leads to self-improvement.
Their method begins with an initial seed “improver” scaffolding program that uses the language model to improve a response to a subsequent challenge. The model improves this improvement program as the system iterates. To measure the effectiveness of their automatic optimization architecture, they apply a limited selection of subsequent algorithmic tasks. Their findings show that the model improves as it goes through more iterations using its self-improvement techniques. STOP demonstrates how language models can function as your meta-optimizers in this way. Additionally, they analyze the type of self-improvement tactics the model suggests (see Figure 1), how well the recommended strategies translate to subsequent tasks, and whether the model is vulnerable to risky self-improvement techniques.
Figure 1: Here are examples of self-improvement techniques suggested and used by GPT-4. Then, the arbitrary code, including the scaffolding code itself, is reviewed using each technique as a scaffold.
Since the underlying language model is not modified, this problem is known as recursive self-improvement code generation, which is inspired, but not entirely, by a recursive self-improvement (RSI) system. It has been at least 50 years since researchers formalized the concept of RSI. That effort, however, focused on creating systems that were more competent overall and assumed that the model could improve every part of its code. Their research is a modest step in that direction because it only considers the model’s ability to improve the scaffolding that invokes it iteratively. The RSI code generation problem is posed for the first time mathematically well defined in this study.
They then create and evaluate STOP to illustrate the possible use of RSI code generation. Different subsequent works have shown improvements. When using a version of the GPT-4 language model trained on data through 2021, well before the debut of most scaffolding systems, Figure 1 shows some of the interesting and useful scaffolding that STOP offers. Additional tests track how often the model attempts to disable a sandbox indicator. Finally, they address issues related to the ethical development of said technology.
The main contributions of this work are:
- Formulate a meta-optimization strategy where a scaffolding system improves itself recursively.
- Demonstrating that this system can be successfully recursively improved using a modern language model (GPT-4 in particular).
- Examine the self-improvement techniques proposed and implemented by the model, including how the model avoids safety precautions such as a sandbox.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 31k+ ML SubReddit, Facebook community of more than 40,000 people, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
We are also on WhatsApp. Join our ai channel on Whatsapp.
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Data Science and artificial intelligence at the Indian Institute of technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around it. She loves connecting with people and collaborating on interesting projects.
<!– ai CONTENT END 2 –>