InfiGUIAgent: a novel multimodal generalist GUI agent with native reasoning and reflection
The development of graphical user interface (GUI) agents faces two key challenges that hinder their effectiveness. First, existing agents lack ...
The development of graphical user interface (GUI) agents faces two key challenges that hinder their effectiveness. First, existing agents lack ...
GUI agents face three critical challenges in professional environments: (1) the increased complexity of professional applications compared to general-purpose software, ...
The development of large language models (LLM) has significantly advanced artificial intelligence (ai) in several fields. Among these advances, mobile ...
Designing GUI agents that perform human-like tasks in graphical user interfaces faces a critical hurdle: collecting high-quality trajectory data for ...
Large Language Models (LLM) and Vision Language Models (VLM) have revolutionized the automation of mobile device control through natural language ...
Graphical user interfaces (GUIs) play a critical role in human-computer interaction, providing the means through which users perform tasks across ...
Graphical user interface (GUI) agents are crucial for automating interactions within digital environments, similar to how humans operate software using ...
The research has its roots in the field of visual language models (VLM), focusing particularly on its application in graphical ...