There is an urgent need to create therapies to meet the healthcare needs of billions of people around the world. However, only a small fraction of clinically recognized diseases currently have licensed treatments. Alterations in the function of genes and the molecules they produce are common causes of disease. Drugs that can restore normal molecular activities are a potential defense against these diseases. Unfortunately, therapeutic approaches to restore the biological activities of damaged genes remain elusive for many disorders. Furthermore, most diseases are caused by changes in many genes, and individuals can have widely varying mutation patterns, even within a single gene. Interactomes, or networks of genes that participate in processes and activities associated with diseases, are a great tool to explain these genetic events. To decipher the genetic architecture disrupted by disease and help create drugs to attack it, machine learning has been used to analyze high-throughput molecular interactomes and data from electronic medical records.
New drug development is challenging, particularly for diseases with few treatment options, but it can replace ineffective drugs with safer and more effective ones. The FDA authorizes treatments for only 500 of hundreds of human diseases. Only 1,363 of the 17,080 clinically recognized disorders included in the analysis had drugs specifically prescribed for them; of these, 435 had a single prescription, 182 had two, and 128 had three. Finding new drugs is therapeutically significant, even for diseases with therapies. It provides more therapy alternatives with fewer adverse effects and replaces medications that have been unsuccessful in certain patient populations.
TXGNN, a geometric deep learning technique for the prediction of therapeutic use, is presented by researchers interested in diseases for which there is a need for more knowledge about their molecular causes and potential treatments. TXGNN is taught using a therapy-focused graph that is overlaid with networks disturbed by diseases currently being treated. This knowledge graph integrates and compiles decades of biological study on 17,080 common and rare diseases. It is optimized to reflect the geometry of the TXGNN therapeutics-focused graph. A graphical neural network model integrates disease and therapeutic candidates into a latent representation space. TXGNN employs a metric learning module that works in latent representation space and can transfer the TXGNN model from observed diseases during training to neglected diseases to bypass the restriction of supervised deep learning to predict the therapeutic use of neglected diseases.
TxGNN is a graphical neural network pretrained on a knowledge graph that includes 17,080 clinically recognized disorders and 7,957 treatment candidates. You can perform different therapeutic tasks in a unified formulation. Zero-firing inference on untrained diseases is possible with TxGNN, as it does not need fine tuning of truth labels in the field or additional parameters after training. Compared to next-generation approaches, TxGNN significantly outperforms the competition, with an increase in accuracy of up to 49.2% for indication tasks and 35.1% for contraindication tasks.
Methodology and Experimental Design: Partitioning of Data Sets for a Comprehensive Performance Assessment
Many diseases have therapeutic potential but have no effective therapies and little or no biological understanding. The potential of TXGNN to predict drug-disease connections in such cases is tested by simulating well-studied diseases as if they were not molecularly characterized using data slices developed by the study team.
First, the group diseases and associated disease-drug boundaries are copied into the test set. This means that during training, TXGNN ignores the existence of edges that represent the current indications and contraindications for the selected disease category. This mimics the difficulty of treating disorders with unknown underlying biological mechanisms.
- Systematic divisions of data sets:
The prediction of intractable diseases should be perfectly adapted to the machine learning model that is being implemented. It is much simpler to envision potential therapies for diseases that currently have treatments than for those that do not. The researchers devised this split to rigorously investigate the model’s ability to predict previously undiscovered diseases. The researchers began by dividing all diseases at random. When no therapies are recognized during training, and the test set comprises single diseases, investigators transfer all drug-disease relationships associated with the test set to the test set. Over a hundred unique diseases are included in each iteration of the test suite.
- Divisions of disease-focused data sets:
The researchers use a disease-focused assessment to model how drug candidates might be used in the clinic. First, the researchers link all drugs in the KG to all diseases in the test set, excluding drug-disease associations in the training set. After that, the researchers rate all possible pairings based on how likely they are to interact with each other. Investigators then calculate withdrawal by retrieving the top K drugs (ie, how many drugs and diseases in the test set are in the full K). The last step is to establish a random selection baseline, in which the K top drugs from the drug pool are randomly sampled and withdrawal is calculated.
Results
- Prediction of therapeutic applications using geometric biological backgrounds in TXGNN. TXGNN is based on the hypothesis that drugs that target disease-disrupted networks in the interactome protein will have the highest chance of success. Optimized to capture the geometry of the TXGNN knowledge graph, TXGNN is a knowledge-based GNN that maps treatment candidates and disorders (disease concepts) into the latent representation space.
- Use of a reference TXGNN for the prediction of zero-shot therapeutic application. Researchers test the ability of TXGNN to predict indications and contraindications. Since TXGNN is intended to treat diseases such as Stargardt disease16 and hyperoxaluria, for which there are currently no treatments available, its performance is measured by a metric called zero-shot performance, in which the model is asked to predict therapeutic use for diseases in a separate data set known as the hold-out set that was not seen during model training.
- 100% accuracy in predicting therapeutic use for five types of diseases. Similar therapies can be used for disorders that have similar biological bases.
- Failing to predict therapeutic use in patients who routinely refuse treatment.
- 100% accuracy for 1,363 disorders for which there are indications and 1,195 conditions for which there are contraindications.
- Careful consideration of which treatments are recommended and which are contraindicated.
- Comparison of TXGNN prognoses with current treatment options. The researchers considered 10 newly launched drugs licensed after the data set and model development of TXGNN was completed to demonstrate that TXGNN is not driven by confirmation bias. In the TXGNN dataset, no drug-disease node is directly connected. The TXGNN was then asked to provide predictions for the researchers.
Characteristics
- For disorders for which there are no drugs and our molecular knowledge is poor, TXGNN has a “zero shot” predictive ability for therapeutic use.
- Despite the practical limitation of not knowing drugs for a specific condition and needing to extrapolate to a new disease area that was not observed during training, TXGNN can greatly improve the prediction of therapeutic use in various disorders.
- In addition, TXGNN predicted therapies show a high degree of correlation with actual electronic health record data, and can be used to test a large number of therapeutic hypotheses simultaneously by locating cohorts of diseases that have or have not been prescribed a particular drug using patient populations followed for several years.
- The TXGNN predictions were presented to a group of physicians and the audience was able to learn more about the self-explanatory model used by TXGNN to treat disease. The importance of clinician-focused design in taking machine learning from development to biomedical implementation is highlighted in the results of a usability study showing that researchers using the interactive TXGNN Explorer can reproduce machine learning models and more easily identify and debug model failure points.
review the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 16k+ ML SubReddit, discord channeland electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Dhanshree Shenwai is a Computer Engineer and has good experience in FinTech companies covering Finance, Cards & Payments and Banking domain with strong interest in AI applications. She is enthusiastic about exploring new technologies and advancements in today’s changing world, making everyone’s life easier.