Creating robots that display robust and dynamic locomotion capabilities, similar to those of animals or humans, has been a long-standing goal in the robotics community. In addition to completing tasks quickly and efficiently, agility allows legged robots to move through complex environments that would otherwise be difficult to traverse. Google researchers have been chasing agility for several years and through various form factors. However, although researchers have allowed walking robots either jump over some obstacles, there is still no generally accepted benchmark that comprehensively measures robot agility or mobility. Rather, benchmarks are driving forces behind the development of machine learning, such as ImageNet for computer vision, and Open AI Gym for reinforcement learning (RL).
In “Barkour: benchmarking agility at the animal level with quadruped robots”, we present the Barkour agility benchmark for quadruped robots, along with a Transformer-General locomotion policy based on. Inspired by canine agility competitions, a legged robot must sequentially display a variety of skills, including moving in different directions, traversing uneven terrain, and jumping over obstacles within a limited period of time to successfully complete the waypoint. By providing a diverse and challenging obstacle course, the Barkour landmark encourages researchers to develop fast-moving controllers of locomotion in a controllable and versatile manner. Furthermore, by linking the performance metric to the actual performance of the dog, we provide an intuitive metric for understanding the robot’s performance relative to its animal counterparts.
We invite a handful of dooglers testing the obstacle course to ensure our agility goals were realistic and challenging. Small dogs complete the obstacle course in about 10 s, while our robot’s typical performance is around 20 s. |
landmark barkour
Barkour’s scoring system uses a target time per obstacle and an overall target time for the course based on the target speed of the small dogs on the course. agility competitions for beginners (about 1.7m/s). Barkour scores range from 0 to 1, with 1 corresponding to the robot successfully traversing all obstacles along the course within the allotted time of approximately 10 seconds, the average time required for a similarly sized dog to traverse the course. . The robot receives penalties for jumping, missing obstacles, or moving too slowly.
Our standard course consists of four unique obstacles in a 5m x 5m area. This is a denser and smaller setup than a typical dog race to allow for easy implementation in a robotics lab. Starting at the starting table, the robot must clear a set of poles, climb an A-frame, perform a 0.5m long jump, and then step onto the end table. We chose this subset of obstacles because they test a diverse set of skills while keeping the setup small. As is the case with real dog agility competitions, the Barkour benchmark can be easily adapted to a larger field area and can incorporate a variable number of obstacles and field configurations.
Learn agile locomotion skills
The Barkour benchmark features a diverse set of obstacles and a delayed reward system, which poses a significant challenge when training a single policy that can complete the entire obstacle course. So, to establish a strong performance baseline and demonstrate benchmark effectiveness for robotic agility research, we adopted a student-teacher framework combined with a zero-shot, real-world simulation approach. First, we train locomotion skills of individual specialists (master) for different obstacles using RL methods in politics. In particular, we take advantage recent advances on a large scale parallel simulation to equip the robot with individual abilities, including policies for walking, climbing slopes, and jumping.
We then train a single policy (student) doing all the skills and transitions in between using a student-teacher framework, building on the specialized skills we previously trained. We use simulation displays to create data sets of state-action pairs for each of the specialized skills. This data set is then distilled into a single generalist Transformer-based locomotion policy, which can handle various terrains and adjust the robot’s gait based on the perceived environment and robot state.
During implementation, we paired the locomotion transformer policy that is capable of multi-skills with a navigation controller that provides speed commands based on the robot’s position. Our trained policy controls the robot based on the robot’s environment represented as an elevation map, velocity commands, and on-board sensory information provided by the robot.
Deployment pipeline for the locomotion transformer architecture. At the time of deployment, a high-level navigation controller guides the real robot through the obstacle course by sending commands to the locomotion transformer policy. |
Robustness and repeatability are difficult to achieve when looking for maximum performance and maximum speed. Sometimes the robot may fail to overcome an obstacle in an agile way. To handle failures we train a recovery policy that quickly brings the robot back to its feet, allowing it to continue the episode.
Assessment
We evaluate the transformer-based generalist locomotion policy using custom quadruped robots and demonstrate that by optimizing for the proposed benchmark, we obtain agile, robust, and versatile abilities for our robot in the real world. In addition, we provide analysis for various design options in our system and their impact on system performance.
Model of the custom robots used for the evaluation. |
We deploy specialist and generalist hardware policies (zero-tripping from sim to real). The robot’s target trajectory is provided by a set of reference points along the various obstacles. For specialized policies, we switch between specialized policies by using a hand-tuned policy switching mechanism that selects the most appropriate policy based on the robot’s position.
Typical performance of our agile locomotion policies on the Barkour benchmark. Our custom built quadruped robot robustly navigates terrain obstacles leveraging various skills learned using RL in simulation. |
We found that very often our policies can handle unexpected events or even hardware degradation, resulting in good average performance, but failures are still possible. As illustrated in the image below, in case of failures, our recovery policy causes the robot to recover quickly, allowing you to continue with the episode. By combining the recovery policy with a simple return to start policy, we can run iterative experiments with minimal human intervention to measure robustness.
Qualitative example of robustness and recovery behaviors. The robot stumbles and overturns after going down the A-frame. This activates the recovery policy, which allows the robot to pick itself up and continue on course. |
We found that through a large number of tests, the single general locomotion transformer policy and the specialized policies with the policy change mechanism achieve similar performance. The locomotion transformer policy has a slightly lower average Barkour score, but exhibits smoother transitions between behaviors and gaits.
A measure of the robustness of different policies over a large number of runs on the Barkour benchmark. |
Histogram of agility scores for the locomotion transformer policy. The highest scores shown in blue (0.75 – 0.9) represent the races in which the robot successfully overcomes all obstacles. |
Conclusion
We believe that developing a benchmark for legged robotics is an important first step in quantifying progress toward agility at the animal level. To establish a solid baseline, we investigated a zero-shot real simulation approach, taking advantage of large-scale parallel simulation and recent advances in training transformer-based architectures. Our findings demonstrate that Barkour is a challenging benchmark that can be easily customized, and that our learning-based method of solving the benchmark provides a quadrupedal robot with a unique low-level policy that can perform a variety of agile abilities. low level.
expressions of gratitude
The authors of this post are now part of Google DeepMind. We would like to thank our co-authors at Google DeepMind and our collaborators at Google Research: Wenhao Yu, J. Chase Kew, Tingnan Zhang, Daniel Freeman, Kuang-Hei Lee, Lisa Lee, Stefano Saliceti, Vincent Zhuang, Nathan Batchelor, Steven Bohez, Federico Casarini, José Enrique Chen, Omar Cortes, Erwin Coumans, Adil Dostmohamed, Gabriel Dulac-Arnold, Alejandro Escontrela, Erik Frey, Roland Hafner, Deepali Jain, Yuheng Kuang, Edward Lee, Linda Luu, Ofir Nachum, Ken Oslund, Jason Powell , Diego Reyes, Francesco Romano, Feresteh Sadeghi, Ron Sloat, Baruch Tabanpour, Daniel Zheng, Michael Neunert, Raia Hadsell, Nicolas Heess, Francesco Nori, Jeff Seto, Carolina Parada, Vikas Sindhwani, Vincent Vanhoucke and Jie Tan. We would also like to thank Marissa Giustina, Ben Jyenis, Gus Kouretas, Nubby Lee, James Lubin, Sherry Moore, Thinh Nguyen, Krista Reymann, Satoshi Kataoka, Trish Blazina, and members of the Google DeepMind robotics team for their contributions to the project. Thanks to John Guilyard for creating the animations in this post.