Last year we presented results showing that a deep learning system (DLS) can be trained to analyze external eye pictures and predict a person’s diabetic retinal disease status and glycosylated hemoglobin (or HbA1c, a biomarker that indicates the three-month average blood glucose level). Photos of the outer eye were previously unknown to contain signals for these conditions. This exciting finding suggested the potential to reduce the need for specialized equipment, since such photos can be captured with smartphones and other consumer devices. Encouraged by these findings, we set out to find out what other biomarkers can be found in this imaging modality.
In “A deep learning model for novel systemic biomarkers in photographs of the external eye: a retrospective study.“, published in Lancet Digital Health, we show that a number of systemic biomarkers spanning several organ systems (eg, kidney, blood, liver) can be predicted from external photographs of the eye with an accuracy exceeding that of a reference logistic regression model. which uses only clinicodemographic variables, such as age and years with diabetes. Comparison with a clinico-demographic baseline is useful because the risk of some diseases could also be assessed using a simple questionnaire, and we seek to understand if the model that interprets images is doing better. This work is in the early stages, but has the potential to increase access to disease detection and monitoring through new non-invasive pathways of care.
A model that generates predictions for a photo of the external eye. |
Development and evaluation of models
To develop our model, we work with partners in EyePACS and the Los Angeles County Department of Health Services to create a de-identified retrospective data set of external eye photographs and measurements in the form of laboratory tests and vital signs (eg, blood pressure). We filtered up to 31 vital signs and laboratory tests that were most commonly available in this data set and then trained a multitasking DLS with a “head” classification for each laboratory and vital to predict abnormalities in these measurements.
It is important to note that evaluating the performance of many anomalies in parallel can be problematic due to a higher probability of finding a false and erroneous result (i.e. due to the multiple comparison problem). To mitigate this, we first test the model on a part of our development dataset. We then narrowed the list down to the nine most promising prediction tasks and tested the model on our test data sets while multiple comparison correction. Specifically, these nine tasks, their associated anatomy, and their significance to associated diseases are listed in the table below.
prediction task | organ system | Importance for associated diseases | ||||||
Albumin < 3.5 g/dL | Kidney Liver | indication of hypoalbuminemiawhich may be due to decreased albumin production from liver disease or increased albumin loss from kidney disease. | ||||||
AST > 36.0 U/L | Liver |
indication of liver disease (ie, liver damage or biliary obstruction), commonly caused by viral infections, alcohol use, and obesity. |
||||||
Calcium < 8.6mg/dL | Bone / Mineral | indication of hypocalcemiawhich is most commonly caused by vitamin D deficiency or parathyroid disorders. | ||||||
eGFR < 60.0 mL/min/1.73m2 | Kidney |
indication of chronic kidney diseasemost commonly due to diabetes and high blood pressure. |
||||||
Hgb < 11.0 g/dL | blood cell count | indication of anemia which may be due to blood loss, chronic medical conditions, or poor diet. | ||||||
Platelet < 150.0 103/µL | blood cell count |
indication of thrombocytopeniawhich may be due to decreased platelet production from bone marrow disorders, such as leukemia or lymphoma, or increased platelet destruction due to autoimmune diseases or medication side effects. |
||||||
TSH > 4.0 mU/L | Thyroid | indication of hypothyroidismIt affects metabolism and can be caused by many different conditions. | ||||||
Urine albumin/creatinine ratio (RAC) ≥ 300.0 mg/g | Kidney |
indication of chronic kidney diseasemost commonly due to diabetes and high blood pressure. |
||||||
WBC < 4.0 103/µL | blood cell count | indication of leukopenia that can affect the body’s ability to fight infection. |
key results
Like in our previous workwe compare our external eye model with a reference model (a Logistic regression model taking clinicodemographic variables as input) calculating the area under the receiver operator curve (AUC). The AUC ranges from 0 to 100%; 50% indicates random performance, and higher values indicate better performance. For all but one of the nine prediction tasks, our model statistically outperformed the reference model. In terms of absolute performance, model AUCs ranged from 62% to 88%. While these levels of accuracy are likely to be insufficient for diagnostic applications, they are in line with other early detection tools such as mammography and pre-screening for diabetes, used to help identify people who may benefit from additional testing. And as an accessible, non-invasive modality, taking pictures of the external eye may offer the potential to help assess and triage patients for confirmatory blood work or other clinical follow-up.
Results on the EyePACS test suite, showing the AUC performance of our DLS compared to a reference model. The variable “n” refers to the total number of data points, and “N” refers to the number of positives. Error bars show 95% confidence intervals calculated with the Long’s method. †Indicates that the objective was previously specified as a secondary analysis; all others were pre-specified as primary analysis. |
External eye photos used in both this and previous studies were collected with tabletop cameras that include a headrest to stabilize the patient and produce high-quality images in good lighting. Since image quality can be worse at other settings, we wanted to explore how resilient the DLS model is to quality changes, starting with image resolution. Specifically, we shrunk the images in the dataset to a range of sizes and measured the performance of the DLS when it was retrained to handle the shrunk images.
Below we show a selection of the results of this experiment (see the paper for more complete results). These results show that the DLS is quite robust and in most cases outperforms the baseline model even if the images are scaled down to 150×150 pixels. This pixel count is less than 0.1 megapixels, much smaller than the typical smartphone camera.
Effect of input image resolution. Above: Sample images scaled to different sizes for this experiment. Below: DLS performance comparison (red) trained and evaluated on different image sizes and the reference model (blue). Shaded regions show 95% confidence intervals calculated using the DeLong method. |
Conclusion and future directions
Our previous research demonstrated the promise of the external eye modality. In this work, we conducted a more extensive search to identify potential systemic biomarkers that can be predicted from these photos. Although these results are promising, many steps remain to determine if a technology like this can help patients in the real world. In particular, as we mentioned earlier, the images in our studies were collected using large tabletop cameras in an environment that controlled for factors such as lighting and head position. Furthermore, the data sets used in this work consist primarily of patients with diabetes and were underrepresented for a number of important subgroups: more focused data collection will be needed for refinement and evaluation of DLS in a more general population and in subgroups before considering clinical. wear.
We are excited to explore how these models generalize to smartphone imaging given the potential scope and scale this allows for the technology. To this end, we continue to work with our co-authors at partner institutions such as Chang Gung Memorial Hospital in taiwan, Aravind Eye Hospital in India, and EyePACS in the United States to collect data sets of images captured on smartphones. Our first results are promising and we look forward to sharing more in the future.
Thanks
This work involved the efforts of a multidisciplinary team of software engineers, researchers, clinicians, and cross-functional collaborators. Key contributors to this project include: Boris Babenko, Ilana Traynis, Christina Chen, Preeti Singh, Akib Uddin, Jorge Cuadros, Lauren P. Daskivich, April Y. Maa, Ramasamy Kim, Eugene Yu-Chuan Kang, Yossi Matias, Greg S Corrado, Lily Peng, Dale R. Webster, Christopher Semturs, Jonathan Krause, Avinash V Varadarajan, Naama Hammel, and Yun Liu. We also thank Dave Steiner, Yuan Liu, and Michael Howell for comments on the manuscript; Amit Talreja for reviewing the item code; Elvia Figueroa and the staff of the Los Angeles County Department of Health Services Teleretinal Diabetic Retinopathy Screening Program for data collection and program support; Andrea Limon and Nikhil Kookkiri for EyePACS data collection and support; Dr. Charles Demosthenes for extracting the data and Peter Kuzmak for imaging the VA data. Last but not least, a special thanks to Tom Small for the animation used in this blog post.