In recent years, the capabilities of robotic systems have improved dramatically. As technology continues to improve and robotic agents are more routinely deployed in real-world settings, their ability to assist in everyday activities will become increasingly important. Repetitive tasks like wiping down surfaces, folding laundry, and cleaning a room seem fine for robots, but remain a challenge for robotic systems designed for structured environments like factories. Performing these types of tasks in more complex environments, such as offices or homes, requires dealing with higher levels of environmental variability captured by high-dimensional sensory inputs, from images plus depth and force sensors.
For example, consider the task of wiping down a table to clean up a spill or remove crumbs. While this task may seem simple, in practice it encompasses many interesting challenges that are ubiquitous in robotics. In fact, at a high level, deciding how best to clean up a spill from an image observation requires solving a challenging planning problem with stochastic dynamics: How should the robot clean up to avoid scattering the spill perceived by a camera? But at a low level, successfully executing a cleaning motion also requires the robot to position itself to reach the problem area while avoiding nearby obstacles, such as chairs, and then coordinate its movements to clean the surface while maintaining contact with the table. . . Solving this table-clearing problem would help researchers tackle a broader range of robotic tasks, such as cleaning windows and opening doors, that require high-level planning from visual observations and precise contact-rich control.
Learning-based techniques, such as reinforcement learning (RL), offer the promise of solving these complex visuomotor tasks from high-dimensional observations. However, the application of comprehensive learning methods to mobile manipulation tasks remains challenging due to the higher dimensionality and the need for precise low-level control. In addition, the implementation in the robot requires the collection of large amounts of data, the use of accurate but computationally expensive models, or fine tuning of the hardware.
In “Robotic table cleaning through reinforcement learning and whole body trajectory optimization”, we present a novel approach to enable a robot to reliably clean tables. By carefully breaking down the task, our approach combines the strengths of RL, the ability to plan in high-dimensional observation spaces with complex stochastic dynamics, and the ability to optimize trajectories, effectively finding full-body robot commands that ensure customer satisfaction. restrictions, such as physical limits and collision avoidance. Given visual observations of a surface to be cleaned, the RL policy selects cleaning actions that are then executed using trajectory optimization. By leveraging a new stochastic differential equation (SDE) simulator of the cleanup task to train the RL policy for high-level scheduling, the proposed end-to-end approach avoids the need for task-specific training data and can transfer zero shot to hardware.
Combining the strengths of RL and optimal control
We propose an end-to-end approach to table cleanup consisting of four components: (1) environment detection, (2) high-level cleanup benchmark planning with RL, (3) computing trajectories for the system full-body (ie, for each joint) with optimal control methods, and (4) execute planned cleaning trajectories with a low-level controller.
System architecture |
The novel component of this approach is a RL policy that effectively plans high-level cleanup waypoints given image observations of spills and crumbs. To train the RL policy, we completely avoid the problem of collecting large amounts of data on the robotic system and avoid using an accurate but computationally expensive physics simulator. Our proposed approach relies on a stochastic differential equation (SDE) to model the latent dynamics of crumbs and spillage, yielding an SDE simulator with four key features:
- It can describe both dry objects pushed by the wiper and liquids absorbed during wiping.
- You can simultaneously capture multiple isolated spills.
- It models the uncertainty of changes in the distribution of spills and crumbs as the robot interacts with them.
- It’s faster than real time: simulating a wipe only takes a few milliseconds.
<!–
The SDE simulator allows simulating dry crumbs (left), which are pushed during each wipe, and spills (right), which are absorbed while wiping. The simulator allows modeling particles with different properties, such as with different absorption and adhesion coefficients and different uncertainty levels. |
–>
The SDE simulator allows you to simulate dry crumbs (left), which are pushed during each cleaning, and spills (good), which are absorbed when cleaning. The simulator allows modeling particles with different properties, such as with different absorption and adhesion coefficients and different levels of uncertainty. |
This SDE simulator can quickly generate large amounts of data for RL training. We validate the SDE simulator using observations from the robot by predicting the evolution of perceived particles for a given wipe. By comparing the result with the particles perceived after running the sweep, we see that the model correctly predicts the general trend of the particle dynamics. A policy trained with this SDE model should be able to perform well in the real world.
Using this SDE model, we formulate a high-level cleanup planning problem and train a vision-based cleanup policy using RL. We fully train in simulation without collecting a data set using the robot. simply randomize the initial state of the SDE to cover a wide range of particle dynamics and spill shapes that we can see in the real world.
In deployment, we first convert the robot image observations to black and white to better isolate spills and crumb particles. We then use these “thresholded” images as input to the RL policy. With this approach, we do not require a visually realistic simulator, which would be complex and potentially difficult to develop, and we can minimize the gap between simulation and reality.
The RL policy inputs are thresholded image observations of the clean state of the table. Its outputs are the desired cleanup actions. The policy uses a ResNet50 neural network architecture followed by two fully connected (FC) layers. |
The desired cleanup moves of the RL policy are executed with a full body trajectory optimizer which efficiently calculates the trajectories of the base and arm joints. This approach allows satisfying constraints, such as collision avoidance, and enables simulation-to-reality implementation without triggering.
Experimental results
We extensively validate our simulation and hardware approach. In simulation, our RL policies outperform heuristic-based baselines, requiring significantly fewer wipes to clean up spills and crumbs. We also tested our policies on issues that were not observed at the time of training, such as multiple isolated spill areas on the table, and found that RL policies generalize well to these novel issues.
Example of deletion actions selected by the RL policy (left) and cleaning performance compared to a baseline (half, good). The baseline is wiped towards the center of the table, rotating after each wipe. We inform the total dirty surface of the table (half) and the dispersion of crumb particles (good) after each additional cleaning. |
Our approach allows the robot to reliably clean up spills and crumbs (without accidentally pushing debris off the table) while avoiding collisions with obstacles such as chairs.
To see more results, please watch the video below:
Conclusion
The results of this work demonstrate that complex visuomotor tasks, such as clearing the table, can be performed reliably without costly comprehensive training and data collection on the robot. The key is to break down the task and combine the strengths of RL, trained using an SDE model of spillage and crumb dynamics, with the strengths of trajectory optimization. We see this work as an important step towards general-purpose home-assisting robots. For more details, see the original paper.
Thanks
We would like to thank our co-authors Sumeet Singh, Mario Prats, Jeffrey Bingham, Jonathan Weisz, Benjie Holson, Xiaohan Zhang, Vikas Sindhwani, Yao Lu, Fei Xia, Peng Xu, Tingnan Zhang, and Jie Tan. We would also like to thank Benjie Holson, Jake Lee, April Zitkovich, and Linda Luu for their help and support in various aspects of the project. We are especially grateful to the entire team at everyday robots for their collaboration in this work and for developing the platform on which these experiments were performed.