The first documented case of pancreatic cancer dates back to the 18th century. Since then, researchers have undertaken a long and challenging odyssey to understand this elusive and deadly disease. To date, there is no better cancer treatment than early intervention. Unfortunately, the pancreas, located deep in the abdomen, is particularly difficult to reach for early detection.
Scientists at MIT's Computer Science and artificial intelligence Laboratory (CSAIL), along with Limor Appelbaum, a scientist in the Department of Radiation Oncology at Beth Israel Deaconess Medical Center (BIDMC), were eager to better identify potential high-risk patients. risk. They set out to develop two machine learning models for the early detection of pancreatic ductal adenocarcinoma (PDAC), the most common form of cancer. To access a large and diverse database, the team synchronized with a federated network company, using electronic medical record data from multiple institutions in the United States. This vast data set helped ensure the reliability and generalizability of the models, making them applicable across a wide range of populations, geographic locations, and demographic groups.
the two models — “PRISM” neural network and logistic regression model (a statistical technique for probability) outperformed current methods. The team's comparison showed that while standard screening criteria identify about 10 percent of PDAC cases using a five-fold relative risk threshold, Prism can detect 35 percent of PDAC cases at this same threshold.
Using ai to detect cancer risk is not a new phenomenon — The algorithms analyze mammograms, CT scans to detect lung cancer and assist in the analysis of Pap smears and HPV tests, to name a few applications. “The PRISM models stand out for their development and validation in an extensive database of more than 5 million patients, exceeding the scale of most previous research in this field,” says Kai Jia, a doctoral student in electrical engineering and computer science (EECS) from MIT. , MIT CSAIL affiliate and first author of an open access book role in eBioMedicine outlining new work. “The model uses routine clinical and laboratory data to make its predictions, and the diversity of the US population is a significant advance over other PDAC models, which are typically limited to specific geographic regions, such as some centers of healthcare in the US, the use of a unique regularization technique in the training process improved the generalizability and interpretability of the models.”
“This report describes a powerful approach to using big data and artificial intelligence algorithms to refine our approach to identifying cancer risk profiles,” says David Avigan, professor at Harvard Medical School and director of the cancer center and chief of hematology. and hematological malignancies of the BIDMC. , who did not participate in the study. “This approach may lead to novel strategies to identify patients at high risk of malignancy who may benefit from focused screening with potential for early intervention.”
Prismatic perspectives
The journey toward developing PRISM began more than six years ago, driven by first-hand experiences with the limitations of current diagnostic practices. “Approximately 80 to 85 percent of pancreatic cancer patients are diagnosed in advanced stages, where a cure is no longer an option,” says lead author Appelbaum, who is also an instructor at Harvard Medical School. and radiation oncologist. “This clinical frustration sparked the idea to delve deeper into the vast amount of data available in electronic health records (EHRs).”
The CSAIL group's close collaboration with Appelbaum made it possible to better understand the combined medical and machine learning aspects of the problem, ultimately leading to a much more accurate and transparent model. “The hypothesis was that these records contained hidden clues: subtle signs and symptoms that could act as early warning signs of pancreatic cancer,” he adds. “This guided our use of federated EHR networks in the development of these models, for a scalable approach to deploying risk prediction tools in healthcare.”
Both the PrismNN and PrismLR models analyze EHR data, including patient demographics, diagnoses, medications, and laboratory results, to assess PDAC risk. PrismNN uses artificial neural networks to detect complex patterns in data features such as age, medical history, and laboratory results, generating a risk score for the probability of PDAC. PrismLR uses logistic regression for a simpler analysis, generating a PDAC probability score based on these characteristics. Together, the models provide a comprehensive evaluation of different approaches to predict PDAC risk from the same EHR data.
A key point in gaining doctors' trust, the team notes, is to better understand how models work, known in the field as interpretability. The scientists noted that while logistic regression models are inherently easier to interpret, recent advances have made deep neural networks somewhat more transparent. This helped the team refine the thousands of potentially predictive features derived from a single patient's EHR to approximately 85 critical indicators. These indicators, which include patient age, diabetes diagnosis, and increased frequency of doctor visits, are automatically discovered by the model, but match doctors' understanding of the risk factors associated with pancreatic cancer.
The way to follow
Despite the promise of PRISM models, as with all research, some parts are still a work in progress. The US data alone is the current diet of models, requiring testing and adaptation for global use. The way forward, the team notes, includes expanding the model's applicability to international data sets and integrating additional biomarkers for more refined risk assessment.
“A subsequent goal for us is to facilitate the implementation of the models in routine healthcare settings. The vision is for these models to run seamlessly in the background of healthcare systems, automatically analyzing patient data and alerting doctors to high-risk cases without increasing their workload,” says Jia. “A machine learning model integrated with the EHR system could provide physicians with early alerts for high-risk patients, potentially allowing for interventions long before symptoms manifest. “We look forward to implementing our techniques in the real world to help all people enjoy longer, healthier lives.”
Jia co-wrote the paper with Applebaum and MIT EECS Professor and CSAIL Principal Investigator Martin Rinard, both lead authors of the paper. Researchers on the paper were supported during their time at MIT CSAIL, in part, by the Defense Advanced Research Projects Agency, Boeing, the National Science Foundation, and Aarno Labs. TriNetX provided resources for the project and Prevent Cancer Foundation also supported the team.