AgentGen: Automating environment and task generation to improve planning capabilities in LLM-based agents with 592 environments and 7246 trajectories

Large language models (LLMs) have transformed artificial intelligence, particularly in the development of agent-based systems. These systems require interacting with various environments and executing actions to achieve specific goals. Improving the planning capabilities of LLM-based agents has become a critical research area due to the complex nature and essential need to accurately complete tasks in numerous applications.

A major challenge in this field of research is the intensive manual labor required to create diverse and extensive planning environments and tasks. Current methodologies predominantly rely on manually designed scenarios, which limits the diversity and amount of available training data. This limitation hampers the potential of LLMs to generalize and perform well in a wide range of situations. To address this issue, researchers have introduced automated techniques to generate a broad spectrum of planning environments and tasks, thereby enriching the training data sets for LLM-based agents.

The research team from the University of Hong Kong and Microsoft Corporation has proposed a new framework called AGENTGENAGENTGEN uses LLMs to automate the generation of environments and their corresponding planning tasks. This innovative approach involves two main stages: environment generation and task generation. Initially, the framework uses an inspiration corpus comprising diverse text segments to create detailed and varied environment specifications. AGENTGEN then generates related planning tasks ranging from simple to complex, ensuring a smooth progression of difficulty and facilitating effective learning for LLMs.

AGENTGEN distinguishes itself by employing a sophisticated environment generation process. The researchers designed an inspiration corpus to serve as a context for synthesizing the environment specifications, which include a complete overview of the environment, descriptions of state and action spaces, and definitions of transition functions. For example, a sample text segment could drive the creation of an environment in which the agent is a nutritionist tasked with developing a new recipe book that includes powdered peanut butter. This method ensures a high level of diversity in the generated environments, creating numerous unique and challenging scenarios for agent training.

The task generation process within AGENTGEN further improves the training data by applying a bi-directional evolution method known as BI-EVOL. This method evolves tasks in two directions: simplifying goal conditions to create easier tasks and increasing complexity to develop more challenging ones. This bi-directional approach results in a comprehensive set of planning tasks that support a gradual and effective learning curve for LLMs: by implementing BI-EVOL, the research team generated 592 unique environments, each containing 20 tasks, resulting in 7,246 high-quality trajectories for training.

The effectiveness of AGENTGEN was rigorously evaluated using the AgentBoard platform. The results were impressive, demonstrating significant improvements in the planning capabilities of LLM-based agents. The AGENTGEN-optimized Llama-3 8B model outperformed GPT-3.5 in overall performance, and on certain tasks, even outperformed GPT-4. Specifically, AGENTGEN achieved a five-fold improvement compared to the raw Llama-3 8B on in-domain tasks, with success rates increasing from 1.67 to 11.67. Furthermore, AGENTGEN showed substantial performance improvement on out-of-domain tasks, achieving a success rate of 29.1 on Alfworld, compared to 17.2 for GPT-3.5.

AGENTGEN demonstrated robust generalization capabilities across multiple models and tasks. The success of the framework was evident in its ability to improve the planning performance of multiple LLMs, including the smaller 7-8B models. For example, Llama-3 8B, after training with AGENTGEN, showed a success rate increase of 10.0 and a progress rate increase of 9.95. These results underscore the effectiveness of AGENTGEN in improving the capabilities of LLM-based agents, regardless of the specific model used.

In conclusion, AGENTGEN, by automating the generation of diverse planning environments and tasks, addresses the limitations of manual design and offers a scalable and efficient approach to improving agent performance. The framework’s ability to generate high-quality trajectory data and its demonstrated success in and out of domain tasks highlight its potential to revolutionize the training and application of LLM-based agents. AGENTGEN’s contributions to agent training methodologies are poised to enhance the development of intelligent systems capable of performing complex planning tasks with greater accuracy and efficiency.

Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our Newsletter..

Don't forget to join our Over 47,000 ML subscribers on Reddit

Find upcoming ai webinars here

Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary engineer and entrepreneur, Asif is committed to harnessing the potential of ai for social good. His most recent initiative is the launch of an ai media platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is technically sound and easily understandable to a wide audience. The platform has over 2 million monthly views, illustrating its popularity among the public.

x-300-.png” alt=””/>

AgentGen: Automating environment and task generation to improve planning capabilities in LLM-based agents with 592 environments and 7246 trajectories

Technical Terrence Team

Buyers increase and sales fall by more than 12.6%

Leave a Reply Cancel reply

Recommended.

How home visiting programs benefit the entire family

Meta AI introduces Meta Segment Anything Model 2 (SAM 2): the first unified model for segmenting objects in images and videos

Is it time for the price of BP shares?

Finished No. 37 | Ethereum Foundation Blog

A Guide to Invoice Management in NetSuite

Categories

Important Links

AgentGen: Automating environment and task generation to improve planning capabilities in LLM-based agents with 592 environments and 7246 trajectories

Related

Technical Terrence Team

Buyers increase and sales fall by more than 12.6%

Leave a Reply Cancel reply

Recommended.

How home visiting programs benefit the entire family

Meta AI introduces Meta Segment Anything Model 2 (SAM 2): the first unified model for segmenting objects in images and videos

Is it time for the price of BP shares?

Finished No. 37 | Ethereum Foundation Blog

A Guide to Invoice Management in NetSuite

Categories

Important Links

Get daily news updates to your inbox!