Medical imaging is the cornerstone of diagnosis, and Artificial Intelligence (AI) is promising to revolutionize that. With the power to detect features and trends invisible to the human eye, AI holds promise for faster and more accurate diagnoses.
But beneath this promise lies a troubling flaw: AI’s tendency to take shortcuts and jump to conclusions.
These shortcuts can lead to misleading and sometimes dangerous conclusions. Like, for instance, algorithms that believe they can “predict” the outcome of an X-ray based on whether someone drinks beer or not.
The researchers trained convolutional neural networks (CNNs) — one of the most popular types of deep learning algorithms — to perform a bizarre task: predict whether a patient avoided eating refried beans or drinking beer simply by looking at their knee X-rays. The model did just that: it achieved a 63% accuracy rate for predicting bean avoidance and a 73% accuracy rate for beer avoidance.
Obviously, this defies logic. There’s no connection between knee anatomy and diet preferences. Yet the models produced statistically significant results. But, this strange outcome wasn’t due to some hidden medical insight. Instead, it was a textbook example of shortcut learning.
Shortcut learning and confounding variables
This study used the Osteoarthritis Initiative (OAI) dataset, a vast collection of over 25,000 knee X-rays. The dataset included various confounding factors — variables that could distort the model’s learning. The researchers found that AI models could predict patient sex, race, clinical site, and even the manufacturer of the X-ray machine with striking accuracy. For example:
- Sex Prediction: 98.7% accuracy
- Clinical Site Prediction: 98.2% accuracy
- Race Prediction: 92.1% accuracy
This is good information, but here’s the thing: the AI may be using these confounding factors as shortcuts. For example, if a particular clinical site has more patients of a specific demographic, the AI might associate that demographic with certain diagnoses — a shortcut that reflects bias rather than medical reality.
Shortcut learning occurs when AI models exploit superficial patterns in data rather than learning meaningful relationships. In medical imaging, shortcut learning means the model isn’t recognizing medical conditions but instead latching onto irrelevant clues.
“While AI has the potential to transform medical imaging, we must be cautious,” says the study’s senior author, Dr. Peter Schilling, an orthopaedic surgeon at Dartmouth Health’s Dartmouth Hitchcock Medical Center and an assistant professor of orthopaedics in Dartmouth’s Geisel School of Medicine.
“These models can see patterns humans cannot, but not all patterns they identify are meaningful or reliable,” Schilling says. “It’s crucial to recognize these risks to prevent misleading conclusions and ensure scientific integrity.”
It could become a bigger problem
Society in general is still deciding what’s the acceptable way to use AI in healthcare. Practitioners agree that AI shouldn’t be let to interpret medical imaging on its own; at most, it should be used as a crutch, with the results and interpretation still re-analyzed by an expert. But with AI usage becoming more and more widespread, and with large-scale workforce shortages, AI may take a more central part.
This is why the findings are so concerning.
For instance, the AI might identify a particular clinical site based on unique markers in the X-ray image, such as the placement of labels or blacked-out sections used to obscure patient information. These markers can correlate with patient demographics or other latent variables like age, race, or diet — factors that shouldn’t affect the diagnosis but can skew the AI’s predictions.
Imagine an AI trained to detect a disease in chest X-rays. If the AI learns to associate a particular hospital’s labeling style with disease prevalence, its predictions will be unreliable when applied to images from other hospitals. This kind of bias can result in misdiagnoses and flawed research findings.
Shortcut learning also undermines the credibility of AI-driven discoveries. Researchers and clinicians may be misled into thinking the AI has identified a groundbreaking medical insight when, in fact, it has merely exploited a meaningless pattern.
“This goes beyond bias from clues of race or gender,” says Brandon Hill, a co-author of the study and a machine learning scientist at Dartmouth Hitchcock. “We found the algorithm could even learn to predict the year an X-ray was taken. It’s pernicious — when you prevent it from learning one of these elements, it will instead learn another it previously ignored. This danger can lead to some really dodgy claims, and researchers need to be aware of how readily this happens when using this technique.”
Can we fix it?
It’s very difficult to eliminate shortcut learning. Even with extensive preprocessing and normalization of images, the AI still identified patterns that humans couldn’t see and tended to make interpretations based on them. This ability to “cheat” by finding irrelevant but statistically significant correlations poses a serious risk for medical applications.
The challenge of shortcut learning has no easy fix. Researchers have proposed various methods to reduce bias, such as balancing datasets or removing confounding variables. But this study shows these solutions often fall short. Shortcut learning can involve multiple, intertwined factors, making it difficult to isolate and correct for each one.
The authors of the study argue that AI in medical imaging needs greater scrutiny. Deep learning algorithms are not hypothesis tests — they are powerful pattern recognition tools. When used for scientific discovery, their results must be rigorously validated to ensure they reflect true medical insights rather than statistical artifacts.
Essentially, we need to subject AIs to much greater scrutiny, especially in a medical context.
“The burden of proof just goes way up when it comes to using models for the discovery of new patterns in medicine,” Hill says. “Part of the problem is our own bias. It is incredibly easy to fall into the trap of presuming that the model ‘sees’ the same way we do. In the end, it doesn’t.”
Researchers also caution against treating AI like a fellow expert.
“AI is almost like dealing with an alien intelligence,” Hill continues. “You want to say the model is ‘cheating,” but that anthropomorphizes the technology. It learned a way to solve the task given to it, but not necessarily how a person would. It doesn’t have logic or reasoning as we typically understand it.”
Journal Reference: Ravi Aggarwal et al, Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis, npj Digital Medicine (2021). DOI: 10.1038/s41746-021-00438-z