Web browsing agents are based on the creation of autonomous systems capable of performing tasks such as searching, purchasing, and retrieving information from the Internet. These agents use advanced language models to interpret instructions and navigate digital environments, making decisions to execute tasks that normally require human intervention. Despite significant advances in this area, agents still struggle to perform complex, long-term tasks that involve a sequence of interdependent actions. These tasks demand a level of adaptability and learning that current systems have not yet been able to effectively achieve.
One of the main challenges in developing these agents is their inability to learn from previous tasks. While they can perform well on examples they have been specifically trained on, they are often inefficient when faced with unfamiliar tasks. Agents operate in isolation, solving each task individually without reusing past experiences to inform future decisions. This limitation reduces their efficiency and adaptability, particularly in environments that require them to handle multiple tasks across multiple domains.
Traditionally, tools and methods to address these problems have relied on fixed training examples or learning in context. These methods enable agents to perform well on predefined sequences of actions, but they are not sufficient when faced with new situations or tasks that differ from their training data. For example, agents trained on specific shopping tasks may fail when asked to navigate a new website or complete a different task, such as booking a flight or retrieving information from social media. The rigidity of these approaches limits the generalizability of agents across diverse tasks and environments.
A research team from Carnegie Mellon University and the Massachusetts Institute of technology (MIT) has introduced a new method called Agent Workflow Memory (AWM) to address these challenges. AWM helps agents learn reusable task workflows from their past experiences, which they can apply to future tasks. This method allows agents to generate and store workflows (common sequences of actions) from previously solved tasks, making it possible to reuse them in different contexts. AWM can be applied in both online and offline environments, where workflows are pre-trained or induced in real-time from test queries, offering a versatile solution for web browsing tasks.
In detail, AWM works by analyzing the agent’s past experiences and extracting workflows from successfully completed tasks. These workflows consist of goal-oriented routines that are stored in the agent’s memory for future use. For example, an agent can learn a basic workflow to look up a place by name on a map. It can then build on this and learn more complex workflows, such as retrieving the location’s zip code. This memory-based approach allows the agent to adapt to increasingly complex tasks by leveraging previously learned workflows to inform future actions.
In terms of performance, AWM was tested on two major benchmarks, Mind2Web and WebArena, consisting of over 1,000 tasks spanning over 200 domains including travel, shopping, and social media. AWM significantly improved the baseline performance. In the Mind2Web benchmark, the task success rate increased by 24.6%, while in WebArena, the relative success rate improved by 51.1%. In addition, AWM reduced the number of steps required to complete tasks in WebArena, achieving an improvement of up to 22.5 points over traditional methods after processing only dozens of examples. These results demonstrate AWM’s ability to improve the efficiency and adaptability of agents in various digital tasks.
The researchers also found that AWM improved generalization across tasks, websites, and domains. In cross-task and cross-domain evaluations, AWM outperformed other baseline methods by 8.9 to 14.0 absolute percentage points. This generalization ability is particularly notable, as it demonstrates that AWM can adapt to tasks that differ significantly from those on which the agent was originally trained. For example, an agent trained on tasks involving shopping websites could effectively generalize to other domains, such as social media or travel, without requiring additional domain-specific training data.
In conclusion, the introduction of agent workflow memory offers a promising solution to the limitations of existing web browsing agents. By allowing agents to learn and reuse workflows from previous experiences, AWM improves task efficiency and adaptability, making these systems more versatile in handling complex and long-term tasks. Test results on Mind2Web and WebArena clearly show the potential of the method to revolutionize web browsing, allowing agents to handle a wider range of tasks with improved performance and fewer steps. This approach marks a significant advance in the development of more intelligent and flexible digital agents capable of generalizing across diverse tasks and domains.
Take a look at the PaperAll credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter and join our Telegram Channel and LinkedIn GrAbove!. If you like our work, you will love our fact sheet..
Don't forget to join our SubReddit of over 50,000 ml
FREE ai WEBINAR: 'SAM 2 for Video: How to Optimize Your Data' (Wednesday, September 25, 4:00 am – 4:45 am EST)
Nikhil is a Consultant Intern at Marktechpost. He is pursuing an integrated dual degree in Materials from Indian Institute of technology, Kharagpur. Nikhil is an ai and Machine Learning enthusiast who is always researching applications in fields like Biomaterials and Biomedical Science. With a strong background in Materials Science, he is exploring new advancements and creating opportunities to contribute.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>