After all, a method for interpreting AI might not be so interpretable | MIT News

ai.png?itok=XMkfZquN” />

As autonomous systems and artificial intelligence become increasingly common in daily life, new methods are emerging to help humans verify that these systems behave as expected. One method, called formal specifications, uses mathematical formulas that can be translated into natural language expressions. Some researchers claim that this method can be used to explain the decisions an ai will make in a way that is interpretable to humans.

Researchers at MIT Lincoln Laboratory wanted to test such interpretability claims. Their findings point to the opposite: formal specifications do not appear to be interpretable by humans. In the team’s study, participants were asked to test whether an ai agent’s plan would be successful in a virtual game. When presented with the formal specification of the plan, participants got it right less than half of the time.

“The results are bad news for researchers who have been claiming that formal methods give interpretability to systems. It might be true in some restricted and abstract sense, but nothing like practical validation of a system,” says Hosea Siu. , laboratory researcher technology-office/artificial-intelligence-technology“>ai technology Group. The groups paper was accepted into the 2023 International Conference on Intelligent Robots and Systems held earlier this month.

Interpretability is important because it allows humans to trust a machine when it is used in the real world. If a robot or ai can explain its actions, then humans will be able to decide if they need adjustments or if they can be trusted to make fair decisions. An interpretable system also allows users of the technology (not just developers) to understand and trust its capabilities. However, interpretability has long been a challenge in the field of ai and autonomy. The machine learning process occurs in a “black box,” so model developers often cannot explain why or how a system arrived at a certain decision.

“When researchers say ‘our machine learning system is accurate,’ we ask ‘how accurate is it?’ and ‘using what data?’ “And if that information is not provided, we reject the claim. “We haven’t been doing much when researchers say ‘our machine learning system is interpretable,’ and we need to start putting those claims under greater scrutiny,” Siu says.

Lost in translation

For their experiment, the researchers sought to determine whether formal specifications made a system’s behavior more interpretable. They focused on people’s ability to use such specifications to validate a system, that is, to understand whether the system always met the user’s objectives.

The application of formal specifications for this purpose is essentially a byproduct of their original use. Formal specifications are part of a larger set of formal methods that use logical expressions as a mathematical framework to describe the behavior of a model. Because the model is based on a logical flow, engineers can use “model checkers” to mathematically prove facts about the system, including when it is possible or not possible for the system to complete a task. Now, researchers are attempting to use this same framework as a translational tool for humans.

“Researchers confuse the fact that formal specifications have precise semantics with the fact that they are interpretable to humans. They are not the same thing,” says Siu. “We realized that almost no one was checking whether people really understood the results.”

In the team’s experiment, participants were asked to validate a fairly simple set of behaviors with a robot playing capture the flag, essentially answering the question “If the robot follows these rules exactly, does it always win?”

Participants included both experts and non-experts in formal methods. They received the formal specifications in three ways: a “raw” logical formula, the formula translated into words closer to natural language, and a decision tree format. Decision trees, in particular, are often seen in the ai world as a human-interpretable way to show ai or robot decision-making.

The results: “Validation performance overall was pretty terrible, with around 45 percent accuracy, regardless of submission type,” Siu says.

Confidently wrong

Those previously trained on formal specifications only performed slightly better than novices. However, experts reported much more confidence in their answers, regardless of whether they were correct or not. In general, people tended to be overly confident in the accuracy of the specifications presented to them, which meant they ignored rule sets that allowed losses in the game. This confirmation bias is particularly concerning for system validation, the researchers say, because people are more likely to overlook failure modes.

“We don’t think this result means we should abandon formal specifications as a way to explain system behaviors to people. But we do think a lot more work needs to be done in designing how they are presented to people and in the flow of work where people use them,” adds Siu.

When considering why the results were so poor, Siu acknowledges that even people working with formal methods are not trained enough to verify the specifications as the experiment asked them to. And it’s hard to think of all the possible outcomes of a set of rules. Still, the rule sets shown to participants were brief, equivalent to no more than a paragraph of text, “much shorter than anything you’d find in any real system,” Siu says.

The team does not attempt to link their results directly to the performance of humans in validating real-world robots. Instead, they aim to use the results as a starting point to consider what the formal logic community may be missing when it claims to be interpretable and how such claims might play out in the real world.

This research was conducted as part of a larger project that Siu and his teammates are working on to improve the relationship between robots and human operators, especially those in the military. The robotics programming process can often leave operators out of the loop. With the similar goal of improving interpretability and trust, the project attempts to allow operators to teach tasks to robots directly, similar to training humans. Such a process could improve both the operator’s confidence in the robot and its adaptability.

Ultimately, they hope that the results of this study and their ongoing research can improve the application of autonomy, as it becomes more integrated into human life and decision-making.

“Our results drive the need for human evaluations of certain autonomy and ai systems and concepts before too many claims are made about their usefulness with humans,” adds Siu.