Potential energy surfaces (PES) represent the relationship between the positions of atoms or molecules and their associated potential energy. ESPs are essential for understanding molecular behavior, chemical reactions, and material properties. They describe how the potential energy of a system changes as the positions of its constituent atoms or molecules vary. These surfaces are usually complex and high-dimensional, making their precise calculation difficult, especially for large molecules or systems.
The reliability of the machine learning ML model still largely depends on the diversity of the training data, especially for chemically reactive systems that must visit high-energy states when undergoing chemical transformations. ML models, by their nature, interpolate between known training data. Still, its extrapolation capacity is limited since the predictions can be unreliable when the molecules or their configurations are different from those of the training set.
Formulating a balanced and diverse data set for a given reactive system is challenging. It is common that the ML model still suffers from an overfitting problem which can lead to models with good accuracy on their original test set, but which are prone to errors when applied to MD simulations, especially for gas phase chemical reactivity in the that power configurations are highly diverse.
Researchers at the University of California, Lawrence Berkeley National Laboratory, and Penn State University have created an active learning AL workflow that extends the originally formulated hydrogen combustion data set by preparing collective variables (CVs) for the first systematic sample. Their work reflects that a negative design data acquisition strategy is necessary to create a more complete ML model of the PES.
By following this active learning strategy, they were able to achieve a final hydrogen combustion ML model that is more diverse and balanced. ML models recover precise forces to continue the trajectory without the need to retrain. They could predict the change in transition state and reaction mechanism at finite temperature and pressure for hydrogen combustion.
The team has illustrated the active learning approach in Rxn18 as an example where the potential energy surface is projected onto two reaction coordinates, CN(O2-O5) and CN(O5-H4). The performance of the ML model was tracked by analyzing the original data points derived from the AIMD and normal modes calculations. They used longer metadynamic simulations for sampling as active learning rounds progressed and errors decreased.
They found that metadynamics is an efficient sampling tool for unstable structures, helping the AL workflow identify holes in the PES landscape to inform the ML model by retraining with such data. Using metadynamics only as a sampling tool, the complicated CV selection step can be avoided by starting with reasonable or intuitive CVs. His future work also includes looking at alternative approaches such as delta learning and working on more physical models such as C-GeM.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join. our 33k+ ML SubReddit, 41k+ Facebook community, Discord channel, and Electronic newsletterwhere we share the latest news on ai research, interesting ai projects and more.
If you like our work, you’ll love our newsletter.
Arshad is an intern at MarktechPost. He is currently pursuing his international career. Master’s degree in Physics from the Indian Institute of technology Kharagpur. Understanding things down to the fundamental level leads to new discoveries that lead to the advancement of technology. He is passionate about understanding nature fundamentally with the help of tools such as mathematical models, machine learning models, and artificial intelligence.
<!– ai CONTENT END 2 –>