Iterative refinement is a key aspect of human problem solving. Iterative refinement is a process that involves making an initial draft and then improving it through self-feedback. For example, when emailing a coworker to request a document, a person would first use a direct request like “give me the details right away.” But, after some thought, the author was able to realize that the sentence could be considered unfriendly and changed it to “Could you provide me with the data?” Using iterative feedback and modification, they show in this study that large language models (LLMs) can successfully mimic this cognitive process in humans.
Although LLMs are capable of producing consistent results at the early stage, they often fall short when addressing more complex requirements, particularly for tasks with multiple objectives (such as generating dialog responses with criteria such as making the response relevant, engaging, and secure) or those with less clear objectives (eg, improve the readability of the program). Modern LLMs can create understandable results in such cases. Still, iterative improvement is required to ensure all assignment requirements are addressed and the appropriate level of quality is achieved.
Advanced methods that rely on third-party supervision and reward models require huge amounts of training data or expensive human annotations, which are often handy to obtain. These drawbacks highlight the need for a more adaptable and efficient text generation method that can be used for many jobs with little supervision. In this study, researchers from CMU, the Allen Institute, the University of Washington, NVIDIA, UCSD, and Google Research propose that SELF-REFINE overcomes these limitations and better replicates the human creative production process without a costly human feedback loop. (Figure 1).
The two halves of SELF-REFINE, FEEDBACK and REFINE, work together in an iterative cycle to produce high-quality results. They transmit the same model M(1), an initial draft produced by the model M(0), to receive feedback(1). The same model (3) receives feedback on the original output, which iteratively improves (0) the output that was initially produced. Iterative repetition of this procedure continues until the model deems that no further improvement is required, at which point the process ends. The central thesis of this study is that in a low-take situation, the same underlying language model handles feedback and refinement.
SELF-REFINE provides the first iterative strategy to improve generation by using NL feedback effectively.
Figure 1 shows the procedure in an example. They use SELF-REFINED to complete various tasks that span many domains and require feedback and review techniques, such as review rewriting, acronym creation, constrained generation, narrative generation, code rewriting, response generation, and elimination of toxicity. Its parent components are instantiated using a few shot hints strategy, allowing us to use a few instances to drive model learning. His iterative approach, which includes experiments, component analysis, a variety of tasks, the generation of useful feedback, and stopping criteria, is intended to guide future research in this field.
His contributions, in summary, are:
- To help LLMs improve on a variety of tasks, they suggest SELF-REFINE, a unique technique that allows them to improve their results by using their feedback repeatedly. Unlike previous efforts, their method requires a single LLM, which uses reinforcement learning or supervised training data.
- They run extensive experiments on seven different tasks (review rewrite, acronym generation, story generation, code rewrite, response generation, constrained generation, and toxicity removal) and show that SELF-REFINED performs at least 5% better , and sometimes up to more than 40% better than a direct generation of powerful generators like GPT-3.5 and even GPT-4.
review the Paper, Code and Project. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 18k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Information Science and Artificial Intelligence at the Indian Institute of Technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around her. She loves connecting with people and collaborating on interesting projects.
🔥 Must Read: What is AI Hallucination? What goes wrong with AI chatbots? How to detect an amazing artificial intelligence?