With the growing popularity of large language models (LLMs), new research and advances are being introduced almost every day. Using deep learning technologies and the power of artificial intelligence, LLMs are continually evolving and extending across all domains. LLMs are trained on massive amounts of raw text, and to improve their performance, these models are fine-tuned. During the tuning process, LLMs are trained on particular tasks using direct training cues that measure their performance, such as classification accuracy, question response, document summarization, etc.
Recently, a new tuning paradigm called LETI (Learn from Textual Interactions) has been introduced, which dives into the potential that extensive language models can learn from textual interactions and comments. LETI allows language models to understand not only if they were wrong, but also why they were wrong. This approach allows LLMs to overcome the limitations of learning solely from labels and scalar rewards.
The team of researchers behind the development of LETI have mentioned how this approach provides textual feedback to the language model. It helps to verify the correctness of the model results with the help of binary labels and identifies and explains the errors in your generated code. The LETI paradigm is like the iterative process of software development, which involves a developer writing a program, testing it, and improving it based on feedback. Similarly, LETI refines the LLM by providing textual comments that point out errors and failures.
During the fitting process, the model is prompted for a natural language description of the problem, followed by it generating a set of solutions. A solution tester then evaluates these solutions using a set of test cases. The researchers used a Python interpreter to use the error messages and stack the traces obtained from the generated code as a source of textual comments. The solution evaluator is that Python interpreter.
The training data used to fit the model consists of three components: natural language instructions, LM-generated programs, and textual feedback. When the generated program cannot provide a solution, feedback is provided to the LLM. Otherwise, a reward token is provided to the model in the form of binary feedback to encourage it to generate an accurate solution. The generated textual feedback is used in the GL fine-tuning process, known as feedback-driven fine-tuning.
For the evaluation process, the researchers have used a data set of code generation tasks called MBPP (Multiple Big Programming Problems) data sets. The results have shown that LETI significantly improves the performance of two base LMs of different scales in the MBPP dataset without requiring actual results for training. On the HumanEval dataset, LETI achieves similar or better performance than LMs based on hidden problems. In addition, the researchers have found that, compared to binary feedback, using textual feedback allows the model to achieve the same performance but with fewer gradient steps.
In conclusion, LETI is an excellent approach to tuning that improves language models through the use of detailed textual comments. It allows them to learn from mistakes and improve performance on tasks like code generation. LETI looks promising.
review the Paper and github link. Don’t forget to join our 21k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out 100 AI tools at AI Tools Club
Tanya Malhotra is a final year student at the University of Petroleum and Power Studies, Dehradun, studying BTech in Computer Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a data science enthusiast with good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.