Using large amounts of labeled data, supervised machine learning algorithms have outperformed human experts in various tasks, raising concerns about job displacement, particularly in diagnostic radiology. However, some argue that job displacement is unlikely in the short term, as many jobs involve a variety of tasks that go beyond mere prediction. Humans can still be essential in prediction tasks, since they can learn from fewer examples. In radiology, human experience is crucial to recognize rare diseases. Similarly, self-driving cars face challenges with rare scenarios, which humans can handle using broader knowledge that goes beyond driving-specific data.
Researchers from MIT and Harvard Medical School investigated whether zero-shot learning algorithms reduce the diagnostic advantage of human radiologists for rare diseases. They compared the performance of CheXzero, a zero-shot algorithm for chest x-rays, with that of human radiologists and CheXpert, a traditional supervised algorithm. CheXzero, trained on the MIMIC-CXR dataset, predicts multiple pathologies using contrastive learning, while CheXpert, trained on Stanford x-rays, diagnoses twelve pathologies with explicit labels. Data were collected from 227 radiologists who evaluated 324 Stanford cases, excluding training data cases, to assess performance variation with disease prevalence.
ai and radiologist performance is compared using the concordance (C) statistic, an extension of AUROC for continuous environments. Agreement, Crt, measures the proportion of concordant pairs, calculated separately for each radiologist and pathology, and then averaged to obtain Ct. ai concordance is denoted as CAt. Concordance is chosen for its invariance with prevalence and lack of preference dependence, making it appropriate even when no case has a high probability of consensus. Despite being an ordinal measure, it is still informative. Another performance metric, deviation from consensus probability, is less effective for low prevalence pathologies, which influences some conclusions.
The classification performance of human radiologists is compared with the CheXzero and CheXpert algorithms. The average prevalence of pathologies is low, around 2.42%, with some exceeding 15%. Radiologists have an average agreement of 0.58, lower than both ai algorithms, and CheXpert slightly outperforms CheXzero. However, CheXpert predictions cover only 12 pathologies, while CheXzero covers 79. Human and CheXzero performances are weakly correlated, indicating different focal points in the x-ray analysis. CheXzero performance varies widely, with concordance ranging from 0.45 to 0.94, compared to the narrower range of 0.52 to 0.72 for human radiologists.
The study illustrates the importance of the long tail in the prevalence of pathologies, revealing that the most relevant pathologies are not covered by the supervised learning algorithm studied. While both human and ai performance improves with the prevalence of the pathology, CheXpert shows substantial improvement in higher prevalence cases. CheXzero's performance is less affected by prevalence, consistently outperforming humans across all prevalence ranges. In particular, CheXzero outperforms humans even in low-prevalence pathologies, challenging the notion of human superiority in such cases. However, evaluating overall algorithmic performance requires cautious interpretation due to the complexity of converting ordinal results into diagnostic decisions, especially for rare pathologies.
Supervised machine learning algorithms have demonstrated superiority in specific tasks compared to humans. However, humans still have value because of their ability to handle rare cases, known as the long tail. Zero-shot learning algorithms aim to address this challenge by avoiding the need for a large amount of labeled data. The study compared radiologists' assessments with two leading algorithms for diagnosing thoracic pathologies, indicating that self-supervised algorithms are quickly closing the gap or outperforming humans in predicting rare diseases. However, challenges in implementing algorithms still need to be resolved, as their results do not directly translate into actionable decisions, suggesting that they are more likely to complement humans rather than replace them.
More modalities.
Review the Paper. All credit for this research goes to the researchers of this project. Also, don't forget to follow us on twitter.com/Marktechpost”>twitter. Join our Telegram channel, Discord channeland LinkedIn Grabove.
If you like our work, you will love our Newsletter..
Don't forget to join our 43k+ ML SubReddit | Also, check out our ai Event Platform
Sana Hassan, a consulting intern at Marktechpost and a dual degree student at IIT Madras, is passionate about applying technology and artificial intelligence to address real-world challenges. With a strong interest in solving practical problems, she brings a new perspective to the intersection of ai and real-life solutions.
<script async src="//platform.twitter.com/widgets.js” charset=”utf-8″>