If someone advises you to “know your limits,” they are probably suggesting that you do things like exercise in moderation. However, for a robot, the lemma represents learning constraints, or limitations of a specific task within the machine's environment, to perform the tasks safely and correctly.
For example, imagine asking a robot to clean your kitchen when it doesn't understand the physics of its environment. How can the machine generate a practical, multi-step plan to ensure the room is spotless? Large language models (LLMs) can get them closer, but if the model is only trained with text, it is likely to miss key details about the robot's physical limitations, such as how far it can go or whether there are nearby obstacles to avoid. Stick only to the LLMs and you'll probably end up cleaning paste stains from your floorboards.
To guide robots in executing these open-ended tasks, researchers at MIT's Computer Science and artificial intelligence Laboratory (CSAIL) used vision models to see what's near the machine and model its limitations. The team strategy involves an LLM outlining a plan that is checked in a simulator to ensure it is safe and realistic. If that sequence of actions is not feasible, the language model will generate a new plan, until it reaches one that the robot can execute.
This trial-and-error method, which the researchers call “Robot Planning by Code for Continuous Constraint Satisfaction” (PRoC3S), tests long-term plans to ensure they satisfy all constraints and allows a robot to perform tasks as diverse as write. individual letters, draw a star and sort and place blocks in different positions. In the future, PRoC3S could help robots complete more complex tasks in dynamic environments like homes, where they can be asked to perform a general task composed of many steps (such as “make me breakfast”).
“LLMs and classical robotic systems such as task and motion planners cannot execute these types of tasks alone, but together, their synergy makes open problem solving possible,” says PhD student Nishanth Kumar SM. '24, co-director author of a new article on PRoC3S. “We are creating an on-the-fly simulation of what surrounds the robot and testing many possible action plans. “Vision models help us create a very realistic digital world that allows the robot to reason about feasible actions for each step of a long-term plan.”
The team's work was presented last month in a paper presented at the Conference on Robot Learning (CoRL) in Munich, Germany.
The researchers' method uses an LLM previously trained on Internet texts. Before asking PRoC3S to perform a task, the team provided its language model with a sample task (such as drawing a square) that is related to the target task (drawing a star). The sample task includes a description of the activity, a long-term plan, and relevant details about the robot's environment.
But how did these plans fare in practice? In the simulations, PRoC3S successfully drew stars and letters eight out of 10 times each. You could also stack digital blocks into pyramids and lines, and place items precisely, such as fruit on a plate. In each of these digital demonstrations, the CSAIL method completed the requested task more consistently than comparable approaches such as “LLM3” and “Code as policies”.
Next, CSAIL engineers took their approach to the real world. His method developed and executed plans on a robotic arm, teaching it to place blocks in a straight line. PRoC3S also allowed the machine to place blue and red blocks in equal bowls and move all the objects near the center of a table.
Kumar and co-senior author Aidan Curtis SM '23, who is also a PhD student working at CSAIL, say these findings indicate how an LLM can develop safer plans that humans can rely on to function in practice. The researchers imagine a home robot that can be given a more general request (such as “bring me some chips”) and reliably figure out the specific steps needed to execute it. PRoC3S could help a robot test plans in an identical digital environment to find a course of action that works and, more importantly, offer it a tasty snack.
For future work, the researchers aim to improve the results using a more advanced physics simulator and extend them to more elaborate tasks with longer horizons using more scalable data search techniques. Additionally, they plan to apply PRoC3S to mobile robots, such as a quadruped, for tasks that include walking and scanning the surroundings.
“Using basic models like ChatGPT to control robot actions can lead to unsafe or incorrect behavior due to hallucinations,” says ai Institute researcher Eric Rosen, who is not involved in the research. “PRoC3S addresses this problem by leveraging basic models to guide high-level tasks, while employing artificial intelligence techniques that explicitly reason about the world to ensure verifiable correct and safe actions. “This combination of planning-based and data-driven approaches may be key to developing robots capable of understanding and reliably performing a broader range of tasks than is currently possible.”
Kumar and Curtis' co-authors are also CSAIL affiliates: MIT undergraduate researcher Jing Cao and MIT Department of Electrical Engineering and Computer Science professors Leslie Pack Kaelbling and Tomás Lozano-Pérez. His work was supported, in part, by the National Science Foundation, the Air Force Office of Scientific Research, the Office of Naval Research, the Army Research Office, MIT Quest for Intelligence, and the ai Institute.