Anyone who has ever tried to pack a family-sized amount of luggage into the trunk of a sedan knows that this is a difficult problem. Robots also struggle with dense packaging tasks.
For the robot, solving the packing problem involves satisfying many constraints, such as stacking luggage so that suitcases do not fall out of the trunk, not placing heavy objects on top of lighter ones, and collisions between the robotic arm and the bumper of the car. are avoided.
Some traditional methods approach this problem sequentially, guessing a partial solution that satisfies one constraint at a time and then checking to see if any other constraint was violated. With a long sequence of actions to take and a lot of luggage to pack, this process can take a long time.
The MIT researchers used a form of generative ai, called the diffusion model, to solve this problem more efficiently. Their method uses a collection of machine learning models, each of which is trained to represent a specific type of constraint. These models are combined to generate global solutions to the packaging problem, taking into account all limitations at once.
Their method was able to generate effective solutions faster than other techniques and produced a greater number of successful solutions in the same amount of time. Importantly, his technique was also able to solve problems with novel combinations of constraints and a larger number of objects, which the models did not see during training.
Because of this generalization, their technique can be used to teach robots how to understand and meet the general constraints of packaging problems, such as the importance of collision avoidance or the desire for one object to be next to another. Robots trained in this way could be applied to a wide range of complex tasks in various environments, from fulfilling orders in a warehouse to organizing a shelf in someone’s home.
“My vision is to push robots to perform more complicated tasks that have many geometric constraints and more continuous decisions that need to be made; These are the types of problems that service robots face in our diverse and unstructured human environments. With the powerful tool of compositional diffusion models, we can now solve these more complex problems and obtain excellent generalization results,” says Zhutian Yang, a graduate student in electrical and computer engineering and lead author of a paper. article about this new machine learning technique.
His co-authors include MIT graduate students Jiayuan Mao and Yilun Du; Jiajun Wu, assistant professor of computer science at Stanford University; Joshua B. Tenenbaum, professor in the Department of Brain and Cognitive Sciences at MIT and member of the Computer Science and artificial intelligence Laboratory (CSAIL); Tomás Lozano-Pérez, professor of computer science and engineering at MIT and member of CSAIL; and senior author Leslie Kaelbling, Panasonic Professor of Computer Science and Engineering at MIT and CSAIL member. The research will be presented at the Robot Learning Conference.
Restriction complications
Continuous constraint satisfaction problems are particularly challenging for robots. These problems appear in multi-step robot manipulation tasks, such as packing items in a box or setting the table for dinner. They often involve achieving a series of constraints, including geometric constraints, such as avoiding collisions between the robot arm and the environment; physical limitations, such as stacking objects to make them stable; and qualitative constraints, such as placing a spoon to the right of a knife.
There can be many constraints and they vary across problems and environments depending on the geometry of the objects and human-specified requirements.
To solve these problems efficiently, MIT researchers developed a machine learning technique called Diffusion-CCSP. Diffusion models learn to generate new data samples that resemble samples in a training data set by iteratively refining their output.
To do this, diffusion models learn a procedure to make small improvements to a potential solution. Then, to solve a problem, they start with a very bad random solution and then gradually improve it.
For example, imagine placing plates and utensils randomly on a simulated table, allowing them to physically overlap. Collision-free constraints between objects will cause them to push each other, while qualitative constraints will drag the plate toward the center, align the salad fork and dinner fork, etc.
Diffusion models are well suited for this type of continuous constraint satisfaction problem because the influences of multiple models on an object’s pose can combine to promote the satisfaction of all constraints, Yang explains. By starting from a random initial guess each time, models can obtain a diverse set of good solutions.
Working together
For Diffusion-CCSP, the researchers wanted to capture the interconnectedness of constraints. When packaging, for example, one constraint might require that a certain object be next to another object, while a second constraint might specify where one of those objects should be located.
Diffusion-CCSP learns a family of diffusion models, one for each type of constraint. The models are trained together, so they share some knowledge, such as the geometry of the objects to be packaged.
The models then work together to find solutions, in this case locations for the objects to be placed, that jointly satisfy the constraints.
“We don’t always reach a solution the first time. But when you keep refining the solution and some violation occurs, it should lead you to a better solution. You get guidance by doing something wrong,” she says.
Training individual models for each constraint type and then combining them to make predictions greatly reduces the amount of training data needed, compared to other approaches.
However, training these models still requires a large amount of data demonstrating solved problems. Humans would need to solve each problem with traditional slow methods, making the cost of generating such data prohibitive, Yang says.
Instead, the researchers reversed the process and proposed solutions first. They used fast algorithms to generate segmented boxes and fit a diverse set of 3D objects into each segment, ensuring tight packaging, stable poses, and collision-free solutions.
“With this process, data generation is almost instantaneous in the simulation. We can generate tens of thousands of environments in which we know that problems have solutions,” he says.
Trained with this data, the diffusion models work together to determine the locations where the robotic gripper should place the objects to perform the packaging task and meet all constraints.
They conducted feasibility studies and then demonstrated Diffusion-CCSP with a real robot by solving a series of difficult problems, including fitting 2D triangles into a box, packing 2D shapes with spatial relationship constraints, stacking 3D objects with stability constraints, and packing 3D objects with a robotic arm.
Their method outperformed other techniques in many experiments, generating a greater number of effective solutions that were stable and collision-free.
In the future, Yang and his collaborators want to test Diffusion-CCSP in more complicated situations, such as with robots that can move around a room. They also want to allow Diffusion-CCSP to address problems in different domains without the need to retrain on new data.
“Diffusion-CCSP is a machine learning solution that builds on existing powerful generative models,” says Danfei Xu, an assistant professor in the School of Interactive Computing at the Georgia Institute of technology and an NVIDIA ai research scientist, who was not involved. with this job. “You can quickly generate solutions that simultaneously satisfy multiple constraints by composing known individual constraint models. “Although still in the early phases of development, continued advances in this approach promise to enable more efficient, safer and more reliable autonomous systems in various applications.”
This research was funded, in part, by the National Science Foundation, the Air Force Office of Scientific Research, the Office of Naval Research, the MIT-IBM Watson ai Laboratory, the MIT Quest for Intelligence, the for Brains, Minds and Machines, Boston Dynamics artificial intelligence Institute, Stanford Institute for Human-Centered artificial intelligence, Analog Devices, JPMorgan Chase and Co. and Salesforce.