Computational biology, chemistry and materials engineering are based on the ability to anticipate the time evolution of matter on an atomic scale. While quantum mechanics governs the vibrations, migration, and bond dissociation of atoms and electrons at a minute level, the phenomena that govern observable physical and chemical processes often occur over considerably greater lengths and longer time scales. Innovation is required both in highly parallelizable architectures with access to exascale processors and in fast, highly accurate computational ways to capture the quantum interactions to bridge these sizes. Current computational approaches cannot test the structural complexity of realistic physical and chemical systems, and the duration of their observable evolution is too long for atomistic simulations.
There has been a lot of research on MLIPs (Machine Learning Interatomic Potentials) in the last two decades. The energies and forces learned from the high-precision reference data are used to drive the MLIPs, which scale linearly with the number of atoms. Early attempts used a Gaussian process or a simple neural network along with manually constructed descriptors. Early MLIPs had poor predictive accuracy because they could not generalize to data structures not present in the training, leading to brittle simulations that could not be used elsewhere.
New research from the Harvard lab demonstrates that biomolecular systems with up to 44 million atoms can be accurately modeled SOTA using Allegro. The team used a large, pre-trained Allegro model for systems with atom counts ranging from 23,000 for DHFR to 91,000 for Factor IX, 400,000 for cellulose, 44,000,000 for the HIV capsid, and over 100,000 for others. systems. An Allegro model pretrained with 8 million weights is used, with a forced error of only 26 meV/A achieved by training on 1 million structures with hybrid functional precision on the fantastic SPICE dataset. Rapid exascale simulations of fringes of previously unimaginable material systems are made possible by the potential to learn entire sets of inorganic materials and organic molecules at this scale of data. This is a very large and powerful model, with 8 million pesos.
To undertake action learning for the automatic construction of training sets, the researchers demonstrated that it is possible to efficiently quantify the uncertainty of the force and energy predictions of the deep equivalent model. Since the equivalent models are accurate, the accuracy bottleneck now lies in the quantum electronic structure calculations needed to train MLIP. Since Gaussian mixture models can be easily fitted in Allegro, it will be possible to run large-scale uncertainty-aware simulations with a single model rather than an ensemble.
Allegro is the only scalable approach that outperforms traditional transformer-based and message transfer designs. In several large systems, they show maximum speeds of more than 100 steps/second and the results extend to more than 100 million atoms. Even at the large scale of 44 million atoms in the HIV capsid, where flaws are generally considerably more obvious, the simulations are stable for nanoseconds out of the box. The team had almost no problems during production.
To better understand the dynamics of huge biomolecular systems and the atomic-level interactions between proteins and drugs, the team hopes their work will pave the way for new avenues in biochemistry and drug discovery.
review the Paper. Don’t forget to join our 20k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out 100 AI tools at AI Tools Club
Tanushree Shenwai is a consulting intern at MarktechPost. She is currently pursuing her B.Tech at the Indian Institute of Technology (IIT), Bhubaneswar. She is a data science enthusiast and has a strong interest in the scope of application of artificial intelligence in various fields. She is passionate about exploring new advances in technology and its real life application.