LLMs have demonstrated impressive capabilities in handling complex question-answering tasks, supported by advances in model architectures and training methods. Techniques such as chain-of-thought (CoT) prompting have gained popularity to improve the explainability and accuracy of answers by guiding the model through intermediate reasoning steps. However, CoT prompting can result in longer outputs, increasing the time needed for answer generation due to the word-by-word decoding process of autoregressive transformers. This creates challenges in maintaining interactive conversations, highlighting the need for metrics to assess the conciseness of outputs and strategies to reduce overly long reasoning chains.
Researchers from the Department of Excellence in Robotics and ai at Scuola Superiore Sant'Anna and Mediavoice Srl analyzed how output length affects LLM inference time. They proposed new metrics to assess conciseness and correctness. They introduced a refined cue engineering strategy, Constrained-Chain-of-Thought (CCoT), which limits output length to improve accuracy and response time. Experiments with LLaMA2-70b on the GSM8K dataset showed that restricting reasoning to 100 words improved accuracy and reduced output length. The study emphasizes the need for brevity in LLM reasoning and highlights the varying effectiveness of CCoT at different model sizes.
Recent research on LLMs has focused on improving accuracy, which often leads to longer and more detailed responses. These extended outputs can cause hallucinations, where the model generates plausible but incorrect information and overly long explanations that obscure key information. Several prompt engineering techniques have been developed to address this, including CoT prompting, which improves reasoning but increases response time. The study introduces metrics to assess both conciseness and correctness and proposes a refined CoT approach, CCoT, to control output length while maintaining quality.
The output generation time of LLMs is influenced by factors such as model architecture, preprocessing, decoding, and the message used. Longer outputs generally increase the response time due to the iterative nature of autoregressive models. Tests on several models (Falcon-7b/40b, Llama2-7b/70b) showed that as the output length increases, so does the generation time. The CoT prompt, which improves response accuracy, also lengthens outputs and generation times. To address this, a CCoT approach is proposed, which limits the output length while maintaining accuracy, thereby reducing the generation time effectively.
The experiments evaluate the effectiveness of the CCoT approach compared to classical CoT, focusing on efficiency, accuracy, and the ability to control the output length. Using the GSM8K dataset, several LLMs (e.g., Llama2-70b, Falcon-40b) were tested. The results show that CCoT reduces the generation time and can improve or maintain accuracy. The study also introduces new metrics (HCA, SCA, CCA) to evaluate model performance, considering correctness and conciseness. Larger models such as Llama2-70b benefit more from CCoT, while smaller models struggle. CCoT demonstrates improved efficiency and concise accuracy, especially for larger LLMs.
The study highlights the importance of conciseness in text generation by LLMs and presents CCoT as a fast engineering technique to control output length. Experiments show that larger models such as Llama2-70b and Falcon-40b benefit from CCoT, but smaller models need help to meet length constraints. The study also proposes new metrics to assess the trade-off between conciseness and correctness. Future research will explore integrating these metrics into model fine-tuning and examine how conciseness affects phenomena such as hallucinations or incorrect reasoning in LLMs.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our Newsletter..
Don't forget to join our Over 47,000 ML subscribers on Reddit
Find upcoming ai webinars here
Sana Hassan, a Consulting Intern at Marktechpost and a dual degree student at IIT Madras, is passionate about applying technology and ai to address real-world challenges. With a keen interest in solving practical problems, she brings a fresh perspective to the intersection of ai and real-life solutions.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>