With the release of platforms such as DALL-E 2 and Midjourney, generative diffusion models have become extremely popular due to their ability to generate a series of absurd, impressive and often meme-worthy images from text prompts. as “teddy bears working on new artificial intelligence research on the moon in the 1980s.” But a team of researchers at MIT’s Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic) thinks there might be more to the spread of generative models than simply creating surreal images: They could speed up the development of new drugs and reduce the probability of adverse reactions. effects
TO paper presenting this new model of molecular docking, called diff dock, will be presented at the 11th International Conference on Learning Representations. The model’s unique approach to computational drug design is a paradigm shift from current state-of-the-art tools used by most pharmaceutical companies, presenting a great opportunity for a revamp of the traditional drug development pipeline.
Drugs normally work by interacting with the proteins that make up our bodies, or proteins from bacteria and viruses. Molecular docking was developed to gain insight into these interactions by predicting the atomic 3D coordinates with which a ligand (ie, a drug molecule) and a protein might bind.
While molecular docking has led to the successful identification of drugs that now treat HIV and cancer, with each drug averaging a decade of development time and 90 percent of drug candidates failing in expensive clinical trials (most studies estimate average drug development costs to be about $1 billion to over $2 billion per drug), it is not surprising that researchers are looking for faster and more efficient ways to screen potential drug molecules.
Currently, most molecular docking tools used for in-silico drug design take a “sample and score” approach, searching for a ligand “pose” that best fits the protein pocket. This time-consuming process evaluates a large number of different poses and then scores them based on how well the ligand binds to the protein.
In previous deep learning solutions, molecular docking is treated as a regression problem. In other words, “it assumes that you have only one goal that you’re trying to optimize for and that there is only one correct answer,” says Gabriele Corso, a co-author and a second-year MIT doctoral student in electrical and computer engineering who is a affiliate from the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). “With generative modeling, you assume that there is a distribution of possible answers; this is essential in the presence of uncertainty.”
“Instead of a single prediction like before, it now allows you to predict multiple poses, each with a different probability,” adds Hannes Stärk, co-author and first-year MIT doctoral student in electrical and computer engineering, who is an affiliate of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). As a result, the model does not need to give up trying to reach a single conclusion, which can be a recipe for failure.
To understand how generative diffusion models work, it is helpful to explain them in terms of image-generating diffusion models. Here, diffusion models gradually add random noise to a 2D image through a series of steps, destroying the data in the image until it becomes nothing more than grainy static. A neural network is then trained to recover the original image by reversing this noise process. The model can then generate new data from a random configuration and iteratively denoise.
In the case of DiffDock, after being trained in a variety of ligand and protein poses, the model can successfully identify multiple binding sites on proteins that it has never encountered before. Instead of generating new image data, it generates new 3D coordinates that help the ligand find potential angles that would allow it to fit into the protein pocket.
This “blind docking” approach creates new opportunities to take advantage of AlphaFold 2 (2020), DeepMind’s famous protein folding AI model. Since the initial release of AlphaFold 1 in 2018, there has been a lot of excitement in the research community about the potential of AlphaFold’s computationally folded protein structures to help identify new mechanisms of drug action. But state-of-the-art molecular docking tools have yet to show that their performance in binding ligands to computationally predicted structures is better than chance.
DiffDock is not only significantly more accurate than previous approaches to traditional docking benchmarks, but because of its ability to reason on a larger scale and implicitly model some of the protein’s flexibility, DiffDock maintains high performance, even when other docking models begin to fail. In the most realistic scenario involving the use of computationally generated unbound protein structures, DiffDock places 22 percent of its predictions to within 2 angstroms (widely considered the threshold for an accurate pose, 1 Å corresponding to one in 10 thousand). million meters), more than double other docking models barely exceeding 10 percent for some and falling as low as 1.7 percent.
These improvements create a new landscape of opportunities for biological research and drug discovery. For example, many drugs are found through a process known as phenotypic screening, in which researchers look at the effects of a given drug on a disease without knowing which proteins the drug acts on. So, discovering the drug’s mechanism of action is critical to understanding how the drug can be improved and its possible side effects. This process, known as “reverse screening,” can be extremely challenging and expensive, but a combination of protein folding techniques and DiffDock may allow much of the process to be performed in silico, allowing potential “off-target” side effects to be identified. “. before clinical trials are conducted.
“DiffDock makes drug target identification much more possible. Previously, laborious and expensive experiments (months to years) had to be done with each protein to define the coupling of the drug. But now, one can look at many proteins and do the sorting virtually in one day,” says Tim Peterson, an assistant professor at Washington University St. Louis School of Medicine. Peterson used DiffDock to characterize the mechanism of action of a new drug candidate treating aging-related diseases in a recent paper. “There is a very ‘fate loves irony’ aspect that Eroom’s law (that drug discovery takes more time and costs more money each year) is being solved by its namesake Moore’s law (that computers get faster and cheap every year) using tools like DiffDock .”
This work was conducted by MIT doctoral students Gabriele Corso, Hannes Stärk, and Bowen Jing, and their advisors, Professor Regina Barzilay and Professor Tommi Jaakkola, and was supported by the Machine Learning for Pharmaceutical Discovery and Synthesis consortium, the Jameel Clinic, DTRA The Discovery of Medical Countermeasures Against New and Emerging Threats program, the DARPA Accelerated Molecular Discovery program, the Sanofi Computational Antibody Design Grant, and a Department of Energy Computational Science Graduate Fellowship.