At the top of many automation wish lists is one particularly time-consuming task: housework.
The goal of many roboticists is to create the right combination of hardware and software so that a machine can learn “generalist” policies—the rules and strategies that guide the robot’s behavior—that work everywhere and under all conditions. Realistically, though, if you have a robot at home, you probably don’t care much about it working for your neighbors. With that in mind, researchers at MIT’s Computer Science and artificial intelligence Laboratory (CSAIL) decided to try to find a solution for easily training robust policies for robots in very specific environments.
“Our goal is for robots to perform exceptionally well in situations of disturbances, distractions, varying lighting conditions, and changes in object positions, all within a single environment,” says Marcel Torne Villasevil, a research assistant at MIT CSAIL in the Improbable ai Lab and lead author of a recent paper. paper About the work. “We propose a method to create digital twins on the fly using the latest advances in computer vision. Using just their phones, anyone can capture a digital replica of the real world, and robots can be trained in a simulated environment much faster than in the real world, thanks to GPU parallelization. Our approach eliminates the need for extensive reward engineering by leveraging a few real-world demonstrations to jumpstart the training process.”
Bringing your robot home
Of course, RialTo is a bit more complicated than a simple gesture with your phone and (boom!) a home robot at your service. It starts by using your device to scan the target environment using tools like NeRFStudio, ARCode, or Polycam. Once the scene is reconstructed, users can load it into RialTo’s interface to make detailed adjustments, add necessary joints to the robots, and more.
The refined scene is then exported and brought into the simulator. Here, the goal is to develop a policy based on real-world actions and observations, such as one to grab a cup from a counter. These real-world demonstrations are then replicated in the simulation, providing some valuable data for reinforcement learning. “This helps create a robust policy that works well both in the simulation and in the real world. An improved algorithm using reinforcement learning helps guide this process, to ensure that the policy is effective when applied outside of the simulator,” Torne says.
The tests showed that RialTo created robust policies for a variety of tasks, whether in controlled lab settings or more unpredictable real-world environments, improving by 67 percent over imitation learning with the same number of demonstrations. The tasks involved opening a toaster, placing a book on a shelf, putting a plate on a rack, placing a cup on a shelf, opening a drawer, and opening a cupboard. For each task, the researchers tested the system’s performance at three increasing levels of difficulty: randomizing object poses, adding visual distractors, and applying physical perturbations during task execution. When paired with real-world data, the system outperformed traditional imitation learning methods, especially in situations with many visual distractions or physical interruptions.
“These experiments show that if we are concerned about being highly robust in a particular environment, the best idea is to leverage digital twins rather than trying to achieve robustness by collecting large-scale data across multiple environments,” says Pulkit Agrawal, director of the Improbable ai Lab, associate professor of electrical engineering and computer science (EECS) at MIT, principal investigator of MIT CSAIL, and senior author of the paper.
As for limitations, RialTo currently takes three days to complete. To speed up the process, the team mentions improving the underlying algorithms and using base models. Simulation training also has its limitations, and it is currently difficult to perform effortless simulation-to-reality transfers and simulate deformable objects or liquids.
The next level
What’s next for RialTo? Building on previous efforts, the scientists are working to preserve robustness against various perturbations while also improving the model’s adaptability to new environments. “Our next effort is this approach of using pre-trained models, speeding up the learning process, minimizing human intervention, and achieving broader generalization capabilities,” Torne says.
“We are very excited about our concept of ‘on-the-fly’ robot programming, where robots can autonomously scan their environment and learn to solve specific tasks in a simulation. While our current method has limitations (such as requiring some initial demonstrations by a human and significant computing time to train these policies (up to three days)), we see it as a significant step towards achieving ‘on-the-fly’ robot learning and deployment,” says Torne. “This approach brings us closer to a future where robots will not need a pre-existing policy that covers all scenarios. Instead, they can quickly learn new tasks without extensive real-world interaction. In my opinion, this breakthrough could accelerate the practical application of robotics much sooner than if we relied solely on a universal, all-encompassing policy.”
“To deploy robots in the real world, researchers have traditionally turned to methods such as imitation learning from expert data, which can be expensive, or reinforcement learning, which can be unsafe,” said Zoey Chen, a PhD student in computer science at the University of Washington who was not involved in the paper. “RialTo directly addresses both the safety limitations of real-world robot learning (RL) and the data-efficient limitations for data-driven learning methods, with its novel real-to-simulation-to-real learning sequence. This novel learning sequence not only ensures safe and robust training in simulation before real-world deployment, but also significantly improves the efficiency of data collection. RialTo has the potential to significantly scale up robot learning and enable robots to adapt to complex real-world scenarios much more effectively.”
“Simulation has demonstrated impressive capabilities on real robots by providing inexpensive, possibly infinite, data for policy learning,” adds Marius Memmel, a PhD student in computer science at the University of Washington who was not involved in the work. “However, these methods are limited to a few specific scenarios, and building the corresponding simulations is expensive and laborious. RialTo provides an easy-to-use tool for reconstructing real-world environments in minutes rather than hours. Furthermore, it makes extensive use of demonstrations collected during policy learning, minimizing operator burden and bridging the gap between simulation and reality. RialTo demonstrates robustness to object poses and perturbations, showing incredible real-world performance without requiring extensive simulator construction and data collection.”
Torne co-wrote this paper with lead authors Abhishek Gupta, an assistant professor at the University of Washington, and Agrawal. Four other CSAIL members are also acknowledged: EECS PhD student Anthony Simeonov SM ’22, research assistant Zechu Li, undergraduate student April Chan, and PhD student Tao Chen ’24. Members of the Improbable ai Lab and WEIRD Lab also provided valuable feedback and support in developing this project.
This work was funded in part by a Sony Research Award, the U.S. government and Hyundai Motor Co., with support from the Washington Embodied Intelligence and Robotics Development (WEIRD) Lab. The researchers presented their work at the Robotics Science and Systems (RSS) conference earlier this month.