This AI research shows how ILF can significantly improve the quality of a code generation model with human-written natural language comments.

Program synthesis, or the automatic creation of computer programs from an input specification, is a crucial problem in software engineering. Efficient program synthesis can not only help the productivity of software engineers, but also make it easier to write code. Pretrained large language models (LLMs) have recently shown significant progress in program synthesis, but despite extensive prior training, they still need to consistently generate suitable code.

For example, raw code pulled from the Internet and used as part of code pre-training data sets often has many security flaws. Researchers postulate that contemporary LLM pre-training setups are substantially to blame for these shortcomings. Incorporating written feedback into LLM has been shown to significantly increase the pass rates of code generation models when input is provided at the time of testing.

Researchers suggest imitation learning from linguistic feedback to train LLMs with linguistic feedback. This algorithm extends the work of Scheurer, who investigated the effects of learning from language feedback in text summarization models. By retraining the base model on improved summaries produced from the initial model summaries and human-written comments, Scheurer improves a summary model. The researchers’ work advances Scheurer in a number of ways, including:

🔥 Must Read: The transformative impact of artificial intelligence on hardware development: its applications, the need for chip redesign, market growth, and who is the leading AI chip maker

By formalizing the algorithm and making it universally applicable in a form
By demonstrating how the reward function can be modified to generate code
Presenting an ILF code (Imitation learning from Language Feedback) developing a proof of concept.

ILF (Imitation Learning from Language Feedback) trains a different model called “Refine” to use language feedback to correct incorrectly created programs to increase the accuracy of programs produced by a reference code generation model called π_Yo. The researchers then improve upon fitting it in the π_Refine generated refinements that pass unit tests, resulting in a final improved model π_Yo*. Researchers refer to the fixed programs as refinements.) This process can be considered to minimize the predicted KL divergence from an actual target terrain distribution, and can be repeated iteratively to further improve the model.

Investigation and Findings

The researchers use the Mostly Basic Python Problems (MBPP) dataset to train and test the models. The Python 974 Programming Assignments at MBPP are created for beginning programmers.

Although the data set has a designated application/training/validation/test division in MBPP, the researchers further divided it into the following divisions:

• MBPPRefine – These jobs have IDs at 111-310. However, CODEGEN-MONO 6.1B failed to complete them accurately. To train π_Refineuse this division.

• MBPPTrain: These tasks have IDs in the range of 311 to 974, but CODEGEN-MONO 6.1B failed to complete them accurately. This division is initially used to evaluate the precision of the refinements produced by π_Refine. Then, it is trained to produce using the appropriate refinements in this division.

• MBPPTest: Researchers use these tasks, which have IDs between 11 and 110, to assess the final performance of π_Yo*. Unlike the other two divisions, they use all of the tasks in this division rather than just those for which CODEGENMONO 6.1B did not initially produce accurate programs. This makes it easy for us to compare the performance of π_Yo and π_Yo* and at their reference levels.

Researchers independently tune two different instances of CODEGEN-MONO 6.1B to produce π_Refine and the final model π_Yo* implement the algorithm. Error program pairs, human-written comments, and human-written refinement targets are used to train the π_Refine algorithm.

Although the ILF algorithm only requires the collection of human written feedback for tasks in MBPPTrain (assuming access to some π_Refine that are already tuned or can generate refinements via few-shot prompts), researchers collect human-written comments and refinements for all slices of the data for further analysis of the approach. This allows us to compare the fine tuning of the refinements generated by π_Refine with adjustments to human-created refinements, for example. ILF requires additional feedback annotations when scaling to various combinations of models and tasks. However, using ILF on one data set may improve model performance on a different data set for the same job. Future studies will include scaling ILF across various workloads and models.

A small sample of MBPP gold programs was used for the training. However, this did not significantly improve accuracy compared to the zero shot inference. The researchers calculated the perplexity of the MBPP gold programs, the π_Refine generated refinements and the refinements written by humans using the pretrained CODEGEN-MONO 6.1B model to test the hypothesis that the gold programs of the MBPP dataset may be slightly out of distribution for CODEGEN-MONO 6.1B. The MBPP data set contains more high perplexity programs (i.e. programs with perplexity 102) than the π_Refine generated refinements or human-written refinements, even though the distributions of the three data sources appear identical. Since the last two data sets are closer to the original distribution of CODEGEN-MONO 6.1B and are still functional, it is probably easier for CODEGEN-MONO 6.1B to learn from them.

Also, ILF is especially useful when more access to large amounts of gold codes is needed. In this context, ILF is a technique for producing training data that explicitly corrects for flaws in the original model and, at the same time, produces training data that is more similar to the actual model output in the data representation space. So even though both training data sets contain the same number of functionally perfect programs, fitting the model on π_Refine the refinements produced do not require changing the weights as much as fitting the model in the MBPP gold programs would.

To sum up

Learning from human written natural language feedback is more efficient in terms of training samples and more effective in terms of coding tasks. An exciting recent discovery is the ability of pretrained extended language models (LLMs) to employ natural language feedback at the time of inference. The researchers extend this finding by formalizing an algorithm, which they refer to as imitation learning from language feedback, to learn from natural language feedback at the time of training (ILF). ILF is easy to use and sample efficient, as it only needs a limited amount of human-written feedback during training and none at test time. The researchers also provide proof of concept in a task that requires the synthesis of a neural program, demonstrating that ILF can be considered a way to minimize KL divergence from the basic reality distribution. Researchers use ILF to outperform fine tuning in the Mainly Basic Python Problems (MBPP) benchmark and fine tuning in human-created fixed programs by increasing the pass rate of a CODEGEN-MONO 6.1B model by 38 Relative % (and 10% absolute) in the MBPP benchmark. The researchers’ findings indicate that purely demo training is inefficient in improving an LLM’s performance on code generation tasks and that learning through human-written natural language feedback is more efficient and effective with samples. .

review the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 17k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.

Dhanshree Shenwai is a Computer Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking domain with strong interest in AI applications. She is enthusiastic about exploring new technologies and advancements in today’s changing world, making everyone’s life easier.

🚀 JOIN the fastest ML subreddit community