Behrooz Tahmasebi, an MIT doctoral student in the Department of Electrical Engineering and Computer Science (EECS) and affiliate of the Computer Science and artificial intelligence Laboratory (CSAIL), was taking a mathematics course on differential equations at the end of 2021 when a ray of inspiration emerged. struck. In that class he first learned about Weyl's law, which had been formulated 110 years earlier by the German mathematician Hermann Weyl. Tahmasebi realized that he might have some relevance to the computer problem he was struggling with at the time, even though the connection seemed, on the surface, to be thin at best. Weyl's law, he says, provides a formula that measures the complexity of the information or spectral data contained within the fundamental frequencies of a drum head or guitar string.
At the same time, Tahmasebi was thinking about measuring the complexity of the input data to a neural network, wondering if that complexity could be reduced by taking into account some of the symmetries inherent in the data set. This reduction, in turn, could facilitate (and speed up) machine learning processes.
Weyl's law, conceived about a century before the rise of machine learning, had traditionally been applied to very different physical situations, such as those related to the vibrations of a string or the spectrum of electromagnetic (blackbody) radiation emitted by a heated object. . . However, Tahmasebi believed that a customized version of that law could help with the machine learning problem he was pursuing. And if the approach works, the payoff could be considerable.
He spoke to his advisor, Stefanie Jegelka, an associate professor at EECS and an affiliate of CSAIL and MIT's Institute for Data, Systems and Society, who believed the idea was definitely worth studying. As Tahmasebi saw it, Weyl's law was all about measuring the complexity of data, just like this project. But Weyl's law, in its original form, said nothing about symmetry.
He and Jegelka have managed to modify Weyl's law so that symmetry can be taken into account when evaluating the complexity of a data set. “To my knowledge,” says Tahmasebi, “this is the first time that Weyl's law has been used to determine how machine learning can be improved through symmetry.”
He paper He and Jegelka wrote and earned the “Spotlight” designation when it was presented at the December 2023 conference on Neural Information Processing Systems, widely considered the world's most important conference on machine learning.
This work, says Soledad Villar, an applied mathematician at Johns Hopkins University, “demonstrates that models that satisfy the symmetries of the problem are not only correct but can also produce predictions with smaller errors, using a small number of training points. (This) is especially important in scientific fields, such as computational chemistry, where training data can be scarce.”
In their paper, Tahmasebi and Jegelka explored the ways in which symmetries, or so-called “invariances,” could benefit machine learning. Suppose, for example, that the goal of a particular computer is to select every image that contains the number 3. That task can be much easier and much faster if the algorithm can identify the 3 regardless of where it is located. placed in the box, whether exactly in the center or to the side, and whether it is pointing up, down, or oriented at a random angle. An algorithm equipped with this latter capability can take advantage of translational and rotational symmetries, meaning that a 3, or any other object, does not itself change when its position is altered or rotated around an arbitrary axis. It is said to be invariant to those changes. The same logic can be applied to the algorithms responsible for identifying dogs or cats. A dog is a dog, you might say, regardless of how it is embedded in an image.
The goal of the entire exercise, the authors explain, is to exploit the intrinsic symmetries of a data set to reduce the complexity of machine learning tasks. This, in turn, can lead to a reduction in the amount of data needed for learning. Specifically, the new work answers the question: how much less data is needed to train a machine learning model if the data contains symmetries?
There are two ways to achieve a gain or benefit by taking advantage of the symmetries present. The first has to do with the size of the sample to be analyzed. Let's imagine that you are tasked, for example, with analyzing an image that has mirror symmetry: the right side is an exact replica, or mirror image, of the left. In that case, it is not necessary to look at every pixel; You can get all the information you need from half the image – a factor two improvement. If, on the other hand, the image can be divided into 10 identical parts, an enhancement factor of 10 can be obtained. This type of enhancing effect is linear.
To take another example, imagine that you are examining a data set, trying to find sequences of blocks that have seven different colors: black, blue, green, purple, red, white, and yellow. Your job becomes much easier if you don't care about the order in which the blocks are arranged. If order mattered, there would be 5,040 different combinations to search for. But if all you care about are sequences of blocks in which the seven colors appear, then you've reduced the number of things (or sequences) you're looking for from 5,040 to just one.
Tahmasebi and Jegelka discovered that it is possible to achieve a different type of gain (exponential) that can be obtained by symmetries operating in many dimensions. This advantage is related to the notion that the complexity of a learning task grows exponentially with the dimensionality of the data space. Therefore, using multidimensional symmetry can lead to disproportionately large performance. “This is a new contribution that basically tells us that higher-dimensional symmetries are more important because they can give us exponential gain,” says Tahmasebi.
The NeurIPS 2023 paper he wrote with Jegelka contains two theorems that were proven mathematically. “The first theorem shows that an improvement in sample complexity can be achieved with the general algorithm we provide,” says Tahmasebi. The second theorem complements the first, she added, “proving that this is the best possible profit that can be obtained; Nothing else can be achieved.”
He and Jegelka have provided a formula that predicts the gain that can be obtained from a particular symmetry in a given application. A virtue of this formula is its generality, Tahmasebi points out. “It works for any symmetry and any input space.” It works not only for symmetries that are known today, but could also be applied in the future to symmetries that are yet to be discovered. This latter perspective is not too far-fetched to consider, given that the search for new symmetries has long been a major drive in physics. This suggests that, as more symmetries are found, the methodology introduced by Tahmasebi and Jegelka should only improve over time.
According to Haggai Maron, a computer scientist at Technion (the Israel Institute of technology) and NVIDIA who was not involved in the work, the approach presented in the paper “differs substantially from previous related work, adopting a geometric perspective and employing differential tools.” . geometry. This theoretical contribution provides mathematical support to the emerging subfield of “geometric deep learning,” which has applications in learning graphs, 3D data, and more. “The article helps establish a theoretical foundation to guide future developments in this rapidly expanding area of research.”