Large language models (LLMs) have had a significant impact on software engineering, primarily in code generation and bug fixing. These models leverage a large amount of training data to understand and complete code based on user input. However, their application in requirements engineering, a crucial aspect of software development, remains unexplored. Software engineers have been reluctant to use LLMs for high-level design tasks due to concerns about understanding complex requirements. Despite this, the use of LLMs in requirements engineering has gradually increased, driven by advances in contextual analysis and reasoning through rapid engineering and chain-of-thought techniques.
The field of LLM-based agents lacks standardized benchmarks, which impedes effective performance evaluation. Previous research did not adequately differentiate between LLMs and the contributions of LLM-based agents. This study addresses these shortcomings by providing a comparative analysis of their applications and effectiveness in various software engineering domains, with the goal of elucidating the potential of LLM-based agents in software engineering practices.
Large language models (LLMs) have demonstrated remarkable success in software engineering tasks such as code generation and vulnerability detection, but they exhibit limitations regarding autonomy and self-improvement. LLM-based agents address these limitations by combining LLM for decision making and action taking. This study emphasizes the need to distinguish between LLM and LLM-based agents, investigating their applications in requirements engineering, code generation, autonomous decision making, software design, test generation, and software maintenance. It provides a comprehensive analysis of tasks, benchmarks, and evaluation metrics for both technologies, aiming to elucidate their potential to advance software engineering practices and potentially progress towards Artificial General Intelligence.
The study addresses crucial challenges in applying large language models (LLMs) to software engineering tasks, focusing on their inherent limitations in autonomy and self-improvement. It identifies a significant gap in the existing literature regarding clear distinctions between LLMs and LLM-based agents. The research highlights the absence of unified standards and benchmarks for evaluating LLM solutions as agents. It provides a comprehensive analysis of LLMs and LLM-based agents across six key software engineering topics: requirements engineering, code generation, autonomous decision making, software design, test generation, and software maintenance. The paper aims to highlight gaps and propose future directions for LLM-based agents in software engineering, pushing the boundaries of these technologies.
A systematic literature review methodology examined LLMs and LLM-based agents in software engineering. Studies were searched in DBLP and arXiv databases from late 2023 to May 2024. Articles were filtered based on relevance and length using software engineering-specific keywords. A snowballing technique improved comprehensiveness. The final selection included 117 relevant articles, some categorized into multiple topics. The analysis focused on experimental models and frameworks, examining performance across multiple domains. Visual representations illustrated the frequency of model usage. This structured approach provided a robust analysis of LLMs and LLM-based agents in software engineering, highlighting applications and challenges.
The study examined LLMs and LLM-based agents in six key areas of software engineering: requirements engineering, code generation, autonomous decision making, software design and evaluation, test generation, and maintenance. Performance metrics included Pass@k, BLEU scores, and success rates. HumanEval and MBPP served as the primary benchmark datasets for the code generation tasks, while many studies used custom datasets. The research identified 79 unique LLMs across 117 papers.
The results indicated a growing interest in LLM-based agents, which combine LLM with decision-making capabilities to enhance autonomy and self-improvement in software development. User feedback from both developers and requirements engineers was crucial in assessing the accuracy, usability, and completeness of the generated results. The findings highlight significant advances in ai for software engineering, while identifying areas for further research and development.
In conclusion, the study provides a comprehensive analysis of large language models and LLM-based agents in software engineering. It categorizes software engineering into six key themes, offering insights into various applications of LLMs. The research draws a clear distinction between traditional LLMs and LLM-based agents, emphasizing their different capabilities and performance metrics. LLM-based agents demonstrate potential improvements to existing processes in several software engineering domains. Statistical analysis of datasets and evaluation metrics highlight performance differences between LLMs and LLM-based agents. The paper suggests future research directions, emphasizing the need for unified standards and benchmarking. It concludes that LLM-based agents represent a promising evolution to address the limitations of traditional models, potentially leading to more autonomous and effective software engineering solutions.
Take a look at the PaperAll credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our fact sheet..
Don't forget to join our Subreddit with over 48 billion users
Find upcoming ai webinars here
Shoaib Nazir is a Consulting Intern at MarktechPost and has completed his dual M.tech degree from Indian Institute of technology (IIT), Kharagpur. Being passionate about data science, he is particularly interested in the various applications of artificial intelligence in various domains. Shoaib is driven by the desire to explore the latest technological advancements and their practical implications in everyday life. His enthusiasm for innovation and solving real-world problems fuels his continuous learning and contribution to the field of ai.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>