One of the reasons deep learning exploded in the last decade was the availability of programming languages that could automate the math (university level calculus) needed to train each new model. Neural networks are trained by tuning their parameters to try to maximize a score that can be quickly computed for the training data. The equations used to adjust the parameters at each tuning step used to be painstakingly derived by hand. Deep learning platforms use a method called automatic differentiation to calculate fits automatically. This allowed researchers to quickly explore a huge space of models and find the ones that actually worked, without needing to know the underlying mathematics.
But what about problems like climate modeling or financial planning, where the underlying scenarios are fundamentally uncertain? For these problems, calculus alone is not enough, you also need probability theory. The “score” is no longer just a deterministic function of the parameters. Instead, it is defined by a stochastic model that makes random decisions to model unknowns. If you try to use deep learning platforms on these problems, they can easily give you the wrong answer. To get around this problem, MIT researchers developed ADEV, which extends automatic differentiation to handle models that make random decisions. This brings the benefits of AI programming to a much broader class of problems, allowing for rapid experimentation with models that can reason about uncertain situations.
Lead author and MIT electrical and computer engineering doctoral student Alex Lew says he hopes people will be less wary of using probabilistic models now that a tool exists to automatically differentiate between them. “The need to derive unbiased, low-variance gradient estimators by hand can lead to the perception that probabilistic models are more complicated or finicky to work with than deterministic ones. But probability is an incredibly useful tool for modeling the world. My hope is that by providing a framework to build these estimators automatically, ADEV will make it more attractive to experiment with probabilistic models, possibly enabling new discoveries and breakthroughs in AI and beyond.”
Sasa Misailovic, an associate professor at the University of Illinois at Urbana-Champaign who was not involved in this research, adds: “As the paradigm of probabilistic programming emerges to solve various problems in science and engineering, questions are raised about how we can make efficient software implementations based on sound mathematical principles ADEV presents such a foundation for modular and compositional probabilistic inference with derivatives ADEV brings the benefits of probabilistic programming (automated mathematics and more scalable inference algorithms) to a much wider range of problems where the goal is not just to infer what is likely to be true, but to decide what action to take next.
In addition to climate modeling and financial modeling, ADEV could also be used for operations research, for example, simulating customer queues for call centers to minimize expected waiting times, simulating waiting processes, and evaluating the quality of results. , or to adjust the algorithm. that a robot uses to grasp physical objects. Co-author Mathieu Huot says that he is excited to see ADEV “being used as a design space for new low-variance estimators, a key challenge in probabilistic calculations.”
The investigation, recipient of the SIGPLAN Distinguished Paper Award at POPL 2023, is co-authored by Vikash Mansighka, who directs the MIT Probabilistic Computing Project in the Department of Brain and Cognitive Sciences and the Computer Science and Artificial Intelligence Laboratory, and helps lead the MIT Search for Intelligence, as well as Mathieu Huot and Sam Staton, both from Oxford University. Huot adds: “ADEV provides a unified framework for reasoning about the ubiquitous problem of estimating gradients fairly, in a clean, elegant, and compositional way.” The research was supported by the National Science Foundation, the DARPA Machine Common Sense program, and a philanthropic gift from the Siegel Family Foundation.
“Many of our most controversial decisions, from climate policy to the tax code, come down to decision-making under uncertainty. ADEV makes it easy to experiment with new ways to solve these problems, by automating some of the toughest math,” he says. Mansinghka. . “For any problem that we can model using a probabilistic program, we have new, automated ways to tune the parameters to try to create the results we want and avoid the results we don’t want.”