A rare disease affects a small proportion of the population. Most rare diseases are genetic and therefore last for a human’s lifetime, even if symptoms do not appear immediately. Many rare disorders manifest early in life; approximately 30% of children with rare diseases die before the age of five.
In recent years, life sciences companies have made commendable progress in rare diseases, but the associated challenges continue to dominate. With the rise of artificial intelligence/machine learning (AI/ML) and its related capabilities, several opportunities for intelligent intervention have emerged that, if harnessed correctly, can significantly improve the rare disease treatment process. AI/ML can help speed accurate patient identification and diagnosis.
Typically, a large number of data sets are required to train machine learning models. Biobanks are large databases that contain genetic and health information on many patients. Its usefulness determines the quantity and quality of data in biobanks. Incomplete data is often a problem in patient data sets. To overcome this problem, the Stanford researchers developed a model capable of predicting a complete set of diagnostic codes (also known as phenotype codes) for all patients in the UK Biobank. UK Biobank is an extensive biomedical data collection and research resource in the UK that includes detailed health and genetic data on half a million UK participants. It has contributed significantly to modern medicine and the advancement of treatment and has enabled several scientific discoveries that have improved human health.
The research team developed POPDx, a machine learning framework for disease recognition, to create a model that generates probabilities that a person may have certain diseases or phenotype codes. POPDx (Objective population-based phenotyping by deep extrapolation) is a bilinear machine learning framework that estimates the probabilities of 1538 phenotype codes at the same time. For the development and evaluation of POPDx, the team extracted phenotypic and health-related data from 392,246 people in the UK Biobank. The POPDx methodology was evaluated and compared with other automated methods of multiple phenotype recognition. It is observed that the POPDx model outperforms existing models in the prediction of rare diseases. The model is an excellent achievement as it does not require a lot of training data, unlike other models. It uses prior knowledge and then predicts diseases that are not present even in the training data. This model is quite useful as, unlike in other fields, the abundance of data on rare diseases is sparse.
The POPDx model searches for relationships between patient data and disease information, making probabilistic decisions using natural language processing and human disease ontology. Since most ML models are based on large data sets, POPDx is a significant achievement that will be beneficial for studying rare diseases. The team used a multi-label classification in this model, since a patient may have one or more diseases. POPDx’s strong performance with little to no data is compelling, eliminating the need for large data sets. Its ability to recognize rare diseases gives clinicians and researchers a better starting point for studying them. One of the problems the team faced was the unavailability of data about a patient. To solve this problem, the team used information from the patient’s history and her records to predict the diseases she might have.
POPDx will improve the future of disease prediction even with unavailability of data sets, proving to be a significant achievement in this field.
review the Paper and Reference article. All credit for this research goes to the researchers of this project. Also, don’t forget to join our 13k+ ML SubReddit, discord channel, and electronic newsletterwhere we share the latest AI research news, exciting AI projects, and more.
Niharika is a technical consulting intern at Marktechpost. She is a third year student, currently pursuing her B.Tech from the Indian Institute of Technology (IIT), Kharagpur. She is a very enthusiastic individual with a strong interest in machine learning, data science, and artificial intelligence and an avid reader of the latest developments in these fields.