Generative ai, currently at the top of popular discourse, promises a world where the simple transforms into the complex, where a simple layout evolves into intricate patterns of images, sounds or text, making the artificial startlingly real.
The realms of the imagination are no longer mere abstractions, as researchers at MIT’s Computer Science and artificial intelligence Laboratory (CSAIL) have brought to life an innovative model of ai. Their new technology integrates two seemingly unrelated physical laws that underpin the best-performing generative models to date: diffusion, which typically illustrates the random movement of elements, such as heat permeating a room or a gas expanding in space , and the Poisson flow, which is based on the principles that govern the activity of electric charges.
This harmonious combination has resulted in superior performance in generating new images, outperforming existing state-of-the-art models. Since its inception, the “Generative Poisson++ flow model”(PFGM++) has found potential applications in various fields, from generating RNA sequences and antibodies to audio production and graphics generation.
The model can generate complex patterns, such as creating realistic images or imitating real-world processes. PFGM++ builds on PFGM, the team’s work from the previous year. PFGM is inspired by the means behind the mathematical equation known as the “Poisson” equation and then applies it to the data that the model is trying to learn from. To do this, the team used a clever trick: they added an extra dimension to the “space” of their model, sort of like going from a 2D sketch to a 3D model. This additional dimension provides more room to maneuver, puts the data in a broader context, and helps approach the data from all directions when generating new samples.
“PFGM++ is an example of the types of advances in ai that can be driven through interdisciplinary collaborations between physicists and computer scientists,” says Jesse Thaler, a theoretical particle physicist at the Center for Theoretical Physics at MIT’s Nuclear Sciences Laboratory and director of National Science Foundation ai. Institute for artificial intelligence and Fundamental Interactions (NSF ai IAIFI), which was not involved in the work. “In recent years, ai-based generative models have delivered numerous surprising results, from photorealistic images to lucid text flows. Surprisingly, some of the most powerful generative models are based on time-tested physics concepts such as symmetries and thermodynamics. PFGM++ takes a centuries-old idea from fundamental physics (that there could be additional dimensions of space-time) and turns it into a powerful and robust tool for generating synthetic but realistic data sets. “I am delighted to see the countless ways in which ‘physical intelligence’ is transforming the field of artificial intelligence.”
The underlying mechanism of PFGM is not as complex as it might seem. The researchers compared the data points to small electrical charges placed in a plane in a dimensionally expanded world. These charges produce an “electric field”, in which the charges seek to move upward along the field lines into an additional dimension and consequently form a uniform distribution in a vast imaginary hemisphere. The generation process is like rewinding a video tape: starting with a set of charges uniformly distributed across the hemisphere and following their journey back to the plane along power lines, they are aligned to match the original data distribution. This intriguing process allows the neural model to learn the electric field and generate new data that reflects the original.
The PFGM++ model extends the electric field in PFGM to an intricate, higher-dimensional framework. When you keep expanding these dimensions, something unexpected happens: the model begins to look like another important class of models, diffusion models. This job is about finding the right balance. PFGM and diffusion models are at opposite ends of a spectrum: one is robust but complex to manage, the other simpler but less robust. The PFGM++ model offers a sweet spot, striking a balance between robustness and ease of use. This innovation paves the way for more efficient generation of images and patterns, marking an important step forward in the technology. In addition to the adjustable dimensions, the researchers proposed a new training method that allows for more efficient learning of the electric field.
To make this theory a reality, the team solved a pair of differential equations detailing the movement of these charges within the electric field. They evaluated performance using the Frechet Inception Distance (FID) score, a widely accepted metric that evaluates the quality of images generated by the model compared to real ones. PFGM++ also shows greater resistance to errors and robustness towards step size in differential equations.
Going forward, it aims to refine certain aspects of the model, particularly in a systematic way to identify the “sweet spot” value of D tailored to specific data, architectures and tasks by analyzing the behavior of the networks’ estimation errors. neuronal. They also plan to apply PFGM++ to modern large-scale text-to-image/text-to-video generation.
“Diffusion models have become a fundamental driving force behind the generative ai revolution,” says Yang Song, research scientist at OpenAI. “PFGM++ presents a powerful generalization of diffusion models, allowing users to generate higher quality images by improving the robustness of image generation against perturbations and learning errors. Furthermore, PFGM++ uncovers a surprising connection between electrostatics and diffusion models, providing new theoretical insights into diffusion modeling research.”
“Poisson flow generative models are not only based on an elegant formulation inspired by electrostatics-based physics, but also offer state-of-the-art generative modeling performance in practice,” said Karsten Kreis, Senior Research Scientist at NVIDIA, which did not participate. at work. “They even outperform popular diffusion models, which currently dominate the literature. This makes them a very powerful generative modeling tool and I envision their application in a variety of areas, from digital content creation to generative drug discovery. More generally, I think the exploration of new physics-inspired generative modeling frameworks holds great promise for the future and that generative Poisson flow models are just the beginning.”
Authors in a paper This work includes three MIT graduate students: Yilun Xu of the Department of Electrical Engineering and Computer Science (EECS) and CSAIL, Ziming Liu of the Department of Physics and NSF ai IAIFI, and Shangyuan Tong of EECS and CSAIL, as well as Google Senior Research Scientist Yonglong Tian PhD ’23. MIT professors Max Tegmark and Tommi Jaakkola advised the research.
The team was supported by the MIT-DSTA Singapore collaboration, the MIT-IBM Watson ai Lab, grants from the National Science Foundation, The Casey and Family Foundation, Foundational Question Institute, Rothberg Family Fund for Cognitive Science, and ML for Pharmaceutical Discovery . and Synthesis Consortium. Their work was presented at the International Conference on Machine Learning this summer.