In the field of artificial intelligence, a persistent challenge has been the development of interactive ai assistants that can effectively navigate and assist in real-world tasks. While significant advances have been made in the digital realm, such as language models, the physical world presents unique obstacles for ai systems.
The main obstacle that researchers often face is the lack of first-hand experience of ai assistants in the physical world, which prevents them from actively perceiving, reasoning, and helping in real-world scenarios. This limitation is attributed to the need for specific data to train ai models on physical tasks.
To address this problem, a team of researchers from Microsoft and eth Zurich has introduced an innovative dataset called “HoloAssist.” This dataset is designed for real-world first-person and egocentric human interaction scenarios. It involves two participants collaborating on physical manipulation tasks: a task performer wearing a mixed reality headset and a task instructor who observes and provides verbal instructions in real time.
HoloAssist has an extensive data collection, including 166 hours of recordings with 222 diverse participants, forming 350 unique instructor-performer pairs completing 20 object-focused manipulation tasks. These tasks cover a wide range of objects, from everyday electronic devices to specialized industrial items. The dataset captures seven synchronized sensor modalities: RGB, depth, head pose, 3D hand pose, gaze, audio, and IMU, providing a comprehensive understanding of human actions and intentions. Additionally, it provides third-person manual annotations, including text summaries, intervention types, error annotations, and action segments.
Unlike previous datasets, HoloAssist’s distinctive feature lies in its interactive and multi-person task execution setup, enabling the development of anticipatory and proactive ai assistants. These assistants can offer timely instructions based on the environment, improving on the traditional “chat-based” ai assistant model.
The research team evaluated the performance of the dataset on action classification and anticipation tasks, providing empirical results that shed light on the importance of different modalities in various tasks. Additionally, they introduced new benchmarks focused on error detection, intervention type prediction, and 3D hand posture prediction, essential elements for the development of intelligent assistants.
In conclusion, this work represents an initial step toward exploring how intelligent agents can collaborate with humans on real-world tasks. The HoloAssist dataset, along with associated benchmarks and tools, is expected to drive research into creating powerful ai assistants for everyday real-world tasks, opening doors to numerous future research directions.
Review the ai-assistants-in-the-real-world/”>Paper and ai-copilots-for-the-physical-world/”>Microsoft article. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 31k+ ML SubReddit, Facebook community of more than 40,000 people, Discord Channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
We are also on WhatsApp. Join our ai channel on Whatsapp.
Pragati Jhunjhunwala is a Consulting Intern at MarktechPost. She is currently pursuing B.tech from the Indian Institute of technology (IIT), Kharagpur. She is a technology enthusiast and has a keen interest in the scope of data science software and applications. She is always reading about the advancements in different fields of ai and ML.
<!– ai CONTENT END 2 –>