Imitation learning (IL) It is one of the methods in robotics where robots are trained to imitate human actions based on expert demonstrations. This method is based on supervised machine learning and requires significant human-generated data to guide the robot's behavior. Although effective for complex tasks, imitation learning is limited by the lack of large-scale data sets and challenges in scaling up data collection, unlike language and vision models. Learning from human video demonstrations faces major challenges because robots cannot match the sensitivity and flexibility of human hands. These differences make it difficult for imitation learning to work effectively or be extended to general robot tasks.
Traditional imitation learning (IL) relied on human-operated robots, which were effective but faced significant limitations. These systems are based on teleoperation using gloves, motion capture and virtual reality devices and depend on complex configurations and low latency control loop. They also relied on physical robots and special-purpose hardware, which was difficult to scale. Although robots could perform tasks such as inserting batteries or tying shoelaces using expert data collected by these methods, the need for special equipment made such methods impractical for large-scale or more general use.
To solve this, a group of researchers from Apple and the University of Colorado Boulder proposed the NAVY system that integrates Apple Vision Pro Headset with external robot control using a combination of ROS and WebSockets. This configuration allowed communication between devices, where the system could be plug-and-play and was flexible for many robot platforms, such as Franc and UR5replacing only 3D model and data format files for the headphones. He NAVY The application handled robot display, data storage and a user interface, received transformation frames for robot links, captured camera image frames, and tracked human skeleton data for processing. The robotic node managed control, data storage, and constraint calculation, transforming skeletal data into robot commands and detecting workspace violations, singularities, and speed issues for real-time feedback.
The robot's movements were aligned with the positions of the human wrist and fingers, and tracked across ARKit in vision 2.0, using inverse kinematics to calculate joint positions and control a gripper based on the gap between the fingers. Constraints such as singularity, workspace boundaries, and speed violations were visualized through color changes, virtual boundaries, or on-screen text. The researchers used the ARMADA system to perform three tasks: picking up a tissue from a box, placing a toy in a cardboard box, and wiping a table with both hands. Each task had five initial states and success was based on specific criteria. Using Apple Vision Pro with ARMADA software on visionOS 2.0, participants provided 45 demonstrations under three feedback conditions: No comments, Commentand Post comments. Wrist and finger movements were tracked in real time using ARKitand the robot's movements were controlled using inverse kinematics, with joint trajectories recorded for playback.
Upon evaluation, results showed that feedback display significantly improved repetition success rates for tasks such as Pick up tissue, Orderand Bimanual wipewith profits of up to 85% compared to no feedback. Post-feedback demonstrations also showed improvements, but were less effective than real-time feedback. Participants found the feedback intuitive and helpful in understanding the robot's movement, and the system worked well for users with varying levels of experience. Common failure modes without feedback included inaccurate robot postures and gripper issues. Participants adjusted their behavior during the demonstrations, slowing down and changing the position of their hands, and were able to visualize the feedback after removing their hands.
In summary, the proposal NAVY The system addressed the challenge of scalable data collection for robot imitation learning by using augmented reality to obtain real-time feedback to improve data quality and compatibility with physical robots. The results showed the importance of feedback to align the robot-free demonstrations with the real kinematics of the robots. While the study focused on simpler tasks, future research can explore more complex ones and refine techniques. This system can serve as a basis for future research in robotics, particularly in training robot control policies through imitation learning with visual observations.
Verify he Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on <a target="_blank" href="https://twitter.com/Marktechpost”>twitter and join our Telegram channel and LinkedIn Grabove. Don't forget to join our SubReddit over 60,000 ml.
Trending: LG ai Research launches EXAONE 3.5 – three frontier-level bilingual open-source ai models that deliver unmatched instruction following and broad context understanding for global leadership in generative ai excellence….
Divyesh is a Consulting Intern at Marktechpost. He is pursuing a BTech in Agricultural and Food Engineering from the Indian Institute of technology Kharagpur. He is a data science and machine learning enthusiast who wants to integrate these leading technologies in agriculture and solve challenges.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>