ByteDance AI Research presents a reinforced fine-tuning (ReFT) method to improve the generalization of LLM learning for reasoning with mathematical problem solving as an example
An effective method to improve LLMs' reasoning skills is to employ supervised fine-tuning (SFT) with chain-of-thought (CoT) annotations. However, this ...