MIT researchers have created machine learning algorithms to create new proteins beyond those found in nature. They used generative models to predict the amino acid sequences of proteins that meet particular structural requirements. These models learn the molecular bonds that govern how proteins develop. The models can produce millions of proteins in just a few days, giving researchers access to a variety of new research possibilities. This tool could be used to create protein-based food coatings that would stay fresher for longer while still being safe for human consumption or to create materials with particular mechanical properties that could eventually replace materials made from ceramic or petroleum with materials that have a significantly lower carbon footprint.
The order of amino acids in a protein chain influences the mechanical properties of the protein. Amino acid chains fold together in 3D patterns to form proteins. Although hundreds of proteins produced by evolution have been identified, experts believe that the vast majority of amino acid sequences are still unknown. Researchers have recently created deep learning algorithms that can predict protein structure for some amino acid sequences to speed up the protein discovery process. However, the inverse problem, which consists of predicting a series of amino acid sequences that satisfy the design objectives, has proven to be more difficult. When creating proteins, attention-based diffusion models must be able to learn very long-range associations because a single mutation in a long amino acid sequence could make or break the entire structure. By first learning to retrieve the training data by removing noise, a diffusion model can learn to produce new data by first introducing noise into the training data.
Using this architecture, the researchers created two machine learning models that can predict a wide range of novel amino acid sequences that will result in proteins that match predetermined structural design targets. Users input desired percentages of various structures for the model that works with general structural qualities, and then the model builds sequences that adhere to those goals. The scientist also selects the order of the amino acid structures for the second model, providing much finer control. The models are linked to a protein folding prediction algorithm that the researchers use to determine the three-dimensional (3D) structure of the protein. They then calculate the resulting properties and compare them with the design requirements.
By contrasting the new proteins with well-known proteins with comparable structural features, they were able to test their models. Most of them shared 50 to 60 percent of their amino acid sequences with those already known, although several also included completely unique sequences. Depending on the degree of similarity, several of the proteins produced are synthesisable. The researchers tried to trick the models by feeding them design targets that were physically impossible to ensure that the predicted proteins made sense. They were surprised to find that the models returned the closest synthesisable answer instead of the unlikely proteins.
review the Paper and WITH Blog. Don’t forget to join our 20k+ ML SubReddit, discord channel, and electronic newsletter, where we share the latest AI research news, exciting AI projects, and more. If you have any questions about the article above or if we missed anything, feel free to email us at [email protected]
🚀 Check out 100 AI tools at AI Tools Club
Niharika is a technical consulting intern at Marktechpost. She is a third year student, currently pursuing her B.Tech from the Indian Institute of Technology (IIT), Kharagpur. She is a very enthusiastic individual with a strong interest in machine learning, data science, and artificial intelligence and an avid reader of the latest developments in these fields.