Even the most advanced large language models (LLMs), such as GPT-4 and PaLM 2, struggle to solve mathematical problems as they require imagination, mathematical reasoning, and computation. The chance that LLMs will be able to discover an appropriate answer is considerably greater when they are allowed to address the problem many times. Therefore, LLMs already demonstrate the potential to improve this arithmetic problem-solving challenge. For example, the pretrained PaLM 2-L can achieve about 33.4% accuracy with greedy decoding. However, 79.4% of the time, there is at least one accurate answer (pass@64) when sampling 64 solutions using temperature sampling (Table 1).
Table 1: Results of adjustment of supervised solutions. We contrast the MATH data set and the PRM800K data set, which are two different sources of training data.
This significant performance disparity shows that LLMs can generate accurate answers but have difficulty differentiating between appropriate and erroneous solutions. Therefore, to reduce the performance gap mentioned above, they investigate task-specific tuning techniques that could improve the LLM’s ability for solution development and evaluation.
They examine three adjustment techniques:
(1) SSFT, supervised stepwise solution fitting. They study whether previously trained LLMs can benefit from a supervised adjustment step as a starting point technique.
They adjust the LLMs to provide the complete solution and answer.
(2) Solution Pool Reclassification (SCR). They continue to refine the generator as a solution evaluator for the reclassification of candidate solutions to improve the LLM’s ability to evaluate solutions. While previous research has looked at such a solution: sample classification or reclassification, they offer a novel method that combines the advantages of majority voting with reclassification while reducing classification costs. To be more precise, as a preliminary stage in majority voting, they first classify the candidates’ answers into several groups according to their mathematical equivalence. Then, to further improve the majority vote results, they apply the solution evaluator to the solutions in the most frequent groups.
(3) Sequential adjustment of multiple tasks. In addition to the solution evaluation task, they are also interested in improving the performance of the LLM on the solution generation task and determining whether the training objective of the solution evaluation task can help the model generate solutions.
To achieve this, they provide a multi-task sequential learning environment where the solution evaluation task is framed as a natural language generation problem, so that their training objective can offer a valuable supervisory signal to the generation model. of solutions. In more detail, they fit the model in three stages: (1) as a generator (SSFT), (2) as a solution evaluator (SCR), and (3) again as a generator (SSFT).
They conduct extensive research using PaLM 2-S* and PaLM 2-L, the small and large forms of PaLM 2, on the difficult MATH data set, resulting in the following conclusions:
• Since SSFT benefits most from detailed, well-formatted answers, the caliber and style of the step-by-step solutions can significantly influence the refined model.
• Reclassifying only the most common groups of solutions can result in better performance than reclassifying all solutions, and can also improve computational efficiency, which is why we think it would be a standard best practice for future work.
• They demonstrate the benefit of training the model for both solution generation and evaluation tasks and present a successful attempt to leverage the learning signal from a binary evaluation task for a generation model. The proposed multi-task sequential fine-tuning can more effectively improve the performance of the solution generation model compared to supervised solution fine-tuning alone.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 32k+ ML SubReddit, Facebook community of more than 40,000 people, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
We are also on WhatsApp. Join our ai channel on Whatsapp.
Aneesh Tickoo is a consulting intern at MarktechPost. She is currently pursuing her bachelor’s degree in Data Science and artificial intelligence at the Indian Institute of technology (IIT), Bhilai. She spends most of her time working on projects aimed at harnessing the power of machine learning. Her research interest is image processing and she is passionate about creating solutions around it. She loves connecting with people and collaborating on interesting projects.
<!– ai CONTENT END 2 –>