Despite decades of research, we don’t see many mobile robots roaming our homes, offices, and streets. Robotic navigation of the real world in human-centric environments remains an unresolved problem. These challenging situations require safe and efficient navigation through tight spaces, such as squeezing between coffee tables and sofas, maneuvering around tight corners, doorways, cluttered rooms, and more. An equally critical requirement is to navigate in a way that complies with unwritten social norms around people, for example, yielding at blind corners or keeping a comfortable distance. Google Research is committed to examining how advances in ML can enable us to overcome these obstacles.
In particular, Transformers models have made amazing advances in various data modalities in real-world machine learning (ML) problems. For example, multimodal architectures have allowed robots to take advantage of Transformer-based language models for high-level planning. Recent job you use Transformers for encoding robotic policies opens up an exciting opportunity to use those architectures for real-world navigation. However, in-robot implementation of massive transformer-based controllers can be challenging due to tight latency constraints for safety-critical mobile robots. The quadratic complexity of space and time of the attention Mechanism with respect to input length is often prohibitively expensive, forcing researchers to trim transformer stacks at the cost of expressiveness.
As part of our continued exploration of ML advancements for robotic products, we partnered on Robotics at Google and everyday robots introduce “Learning model Predictive drivers with real-time attention for real-world navigation” in it Robot Learning Conference (CoRL 2022). Here, we introduce Performer-MPC, a comprehensive learning robotic system that combines (1) a JAX-based differentiable model predictive controller (MPC) that backpropagates gradients to their cost function parameters, (2) Transformer-based context encodings (eg, occupancy grids for navigation tasks) that represent the MPC cost function and adapt the MPC to complex social scenarios without hand-coded rules, and (3) Performing architectures: scalable low-rank implicit attention transformers with time-complexity attention modules and linear space for robot deployment efficiency (providing a robot latency of 8 ms). We show that Performer-MPC can generalize through different environments to help robots navigate in tight spaces while demonstrating socially acceptable behaviors.
Interpreter-MPC
Performer-MPC aims to combine classical MPCs with ML through its learnable cost functions. Therefore, the MPC-executants can be considered as an instantiation of the reverse reinforcement learning algorithms, where the cost function is inferred by learning from expert demonstrations. Critically, the learning component of the cost function is parameterized by latent embeddings produced by the Performer-Transformer. The linear inference provided by Performers is a gateway to the implementation in the robot in real time.
In practice, the occupancy grid provided by the fusion of the robot’s sensors serves as input to Vision Performer model. This model never explicitly materializes the attention matrix, but instead takes advantage of its low-rank decomposition for efficient linear computation of the attention modulus, resulting in scalable attention. Then, the addition of the particular fixed input patch token from the last layer of the model parameterizes the quadratic and learnable part of the cost function of the MPC model. That part is added to the regular manual engineering cost (distance from obstacles, penalty terms for sudden speed changes, etc.). The system is trained end-to-end through imitation learning to imitate expert demonstrations.
Robot navigation in the real world
Although, in principle, Performer-MPC can be applied in various robotic environments, we evaluated its performance in navigation in confined spaces with the potential presence of people. We implemented Performer-MPC in a robot with differential wheels who has a 3D LIDAR camera on the front and depth sensors mounted on its head. Our 8 ms latency per-robot deployable Performer-MPC has 8.3 million Performer parameters. Performer’s actual single run time is about 1ms and we use the fastest Interpreter-ReLU variant.
We compare Performer-MPC with two baselines, a regular MPC policy (RMPC) without the learned cost components and a explicit policy (EP) that predicts a reference and target state using the same Performer architecture, but without being coupled to the MPC framework. We evaluated Performer-MPC in a simulation and in three real-world scenarios. For each scenario, the learned policies (EP and Performer-MPC) are trained with scenario-specific demos.
Our policies are trained through behavior cloning with a few hours of navigation data from robots controlled by humans in the real world. For more details on data collection, see the paper. We visualize the planning results of Performer-MPC (green) and RMPC (red) together with expert demonstrations (grey) in the top half and the training and test curves in the bottom half of the next two figures. To measure the distance between the robot trajectory and the expert trajectory, we use Hausdorff distance.
Learning to avoid local minima
We evaluated Performer-MPC in a simulated gate-crossing scenario in which 100 start-finish pairs from opposite sides of the wall are randomly sampled. A scheduler, guided by a greedy cost function, often drives the robot to a local minimum (ie, it gets stuck at the closest point to the goal on the other side of the wall). Performer-MPC learns a cost function that directs the robot to pass the gate, even if it must deviate from the goal and travel further. Performer-MPC shows a success rate of 86% compared to 24% for RMPC.
Comparison of Performer-MPC with Regular MPC in the gate-passing task. |
Learning of very limited maneuvers
Next, we test Performer-MPC in a challenging real-world scenario, where the robot must perform sharp, near-collision maneuvers in a messy home or office. A global planner provides approximate waypoints (a skeletal navigation path) that the robot follows. Each policy is executed ten times and we report a success rate (SR) and average completion percentage (CP) with variance (VAR) of navigating the obstacle course, where the robot can traverse without failing (collisions or stuck). Performer-MPC outperforms both RMPC and EP in SR and CP.
An obstacle course with policy trajectories and failure locations (indicated by crosses) for RMPC, EP, and Performer-MPC. |
A helper robot from Everyday Robots that maneuvers through highly restricted spaces using regular MPC, explicit policy, and execution MPC. |
Learning to navigate in spaces with people
Beyond static hurdles, we apply Performer-MPC to social robot navigation, where robots must navigate in a socially acceptable way for which cost functions are difficult to design. We consider two scenarios: (1) blind corners, where robots must avoid the inside corner of a corridor in case a person suddenly appears, and (2) pedestrian obstruction, where a person unexpectedly impedes the path prescribed by the robot.
Comparison with a helper robot from Everyday Robots using regular MPC, explicit policy, and performing MPC on invisible blind corners. |
Comparison with an Everyday Robots helper robot using regular MPC, explicit policy, and performer MPC in invisible pedestrian obstruction scenarios. |
Conclusion
Introducing Performer-MPC, an end-to-end robotic learning system that combines various mechanisms to enable real-world, robust, adaptive robotic navigation with on-robot transformers in real time. This work shows that scalable transformer architectures play a critical role in the design of attention-based expressive robotic controllers. We show that real-time millisecond latency inference is feasible for policies leveraging transformers with a few million parameters. Furthermore, we show that such policies allow robots to learn efficient and socially acceptable behaviors that can be well generalized. We believe this opens an exciting new chapter in the application of Transformers to real-world robotics and we look forward to continuing our research with Everyday Robots helper robots.
Thanks
A special thanks to Xuesu Xiao for co-leading this effort at Everyday Robots as a visiting researcher. This research was carried out by Xuesu Xiao, Tingnan Zhang, Krzysztof Choromanski, Edward Lee, Anthony Francis, Jake Varley, Stephen Tu, Sumeet Singh, Peng Xu, Fei Xia, Sven Mikael Persson, Dmitry Kalashnikov, Leila Takayama, Roy Frostig, Jie Tan , Carolina Parada and Vikas Sindhwani. Special thanks to Vincent Vanhoucke for his comments on the manuscript.