Walking to a friend’s house or browsing the aisles of a grocery store may seem like a simple task, but it actually requires sophisticated capabilities. This is because humans are able to effortlessly understand their environment and detect complex information about patterns, objects, and their own location in the environment.
What if robots could perceive their environment in a similar way? That question is on the minds of MIT Laboratory for Information and Decision Systems (LIDS) researchers Luca Carlone and Jonathan How. In 2020, a team led by Carlone released the first version of It grows, an open source library that allows a single robot to build a three-dimensional map of its environment in real time, while labeling different objects in view. Last year, the research groups of Carlone and How (SPARK Laboratory and Aerospace Controls Laboratory) introduced Kimera-Multi, an updated system where multiple robots communicate with each other to create a unified map. A 2022 paper associated with the project recently received this year’s award IEEE Transactions on Robotics King-Sun Fu Memorial Best Article Award, awarded to the best article published in the magazine in 2022.
Carlone, Leonardo Career Development Associate Professor of Aeronautics and Astronautics, and Richard Cockburn Maclaurin Professor of Aeronautics and Astronautics, spoke with LIDS about Kimera-Multi and the future of how robots might perceive and interact with their environment.
Q: Currently its laboratories focus on increasing the number of robots that can work together to generate 3D maps of the environment. What are some of the possible advantages of expanding this system?
As: The key benefit depends on consistency, in the sense that a robot can create an independent map, and that map is self-consistent but not globally consistent. Our goal is for the team to have a consistent map of the world; that’s the key difference in trying to form a consensus between robots instead of mapping independently.
Carlos: In many scenarios it is also good to have a little redundancy. For example, if we deploy a single robot on a search and rescue mission and something happens to that robot, it will not be able to find the survivors. If multiple robots are exploring, there is a much higher chance of success. Expanding the robot team also means that any given task can be completed in a shorter period of time.
Q: What are some of the lessons you’ve learned from recent experiments and challenges you’ve had to overcome when designing these systems?
Carlos: We recently did a large mapping experiment on the MIT campus, in which eight robots traveled up to 8 kilometers in total. The robots have no prior knowledge of the campus or GPS. Its main tasks are to estimate its own trajectory and build a map around it. He wants robots to understand the environment like humans do; Humans not only understand the shape of obstacles, to avoid them without hitting them, but they also understand that an object is a chair, a desk, etc. There is the semantic part.
The interesting thing is that when robots meet, they exchange information to improve their map of the environment. For example, if robots become connected, they can take advantage of the information to correct their own trajectory. The challenge is that if you want to reach consensus between robots, you don’t have the bandwidth to exchange too much data. One of the key contributions of our 2022 paper is to implement a distributed protocol, in which robots exchange limited information but can still agree on the appearance of the map. They do not send camera images back and forth, but only exchange specific 3D coordinates and clues extracted from sensor data. As you continue to exchange such data, you will be able to reach a consensus.
Right now we are building color-coded 3D meshes or maps, where the color contains some semantic information, such as “green” corresponds to grass and “magenta” corresponds to a building. But as humans, we have a much more sophisticated understanding of reality and have a lot of prior knowledge about the relationships between objects. For example, if you were looking for a bed, you would go to the bedroom instead of exploring the entire house. If you start to understand the complex relationships between things, you can be much smarter about what the robot can do in the environment. We’re trying to move from capturing just one layer of semantics to a more hierarchical representation where robots understand rooms, buildings, and other concepts.
Q: What types of applications could Kimera and similar technologies lead to in the future?
As: Autonomous vehicle companies are doing a lot of mapping of the world and learning from the environments they are in. The holy grail would be that if these vehicles could communicate with each other and share information, then they could improve models and maps much faster. The current solutions that exist are individualized. If a truck stops next to you, you won’t be able to see in a certain direction. Could another vehicle provide a field of view that yours would not otherwise have? This is a futuristic idea because it requires vehicles to communicate in new ways and there are privacy issues to overcome. But if we could solve those problems, you could imagine a significantly improved security situation, where data would be accessed from multiple perspectives, not just from your field of view.
Carlos: These technologies will have many applications. I mentioned search and rescue earlier. Imagine you want to explore a forest and search for survivors, or map buildings after an earthquake in a way that can help first responders access trapped people. Another environment where these technologies could be applied is in factories. Currently, robots deployed in factories are very rigid. They follow patterns on the ground and are not really able to understand what is around them. But if we think about much more flexible factories in the future, robots will have to cooperate with humans and exist in a much less structured environment.