Recent research from Harvard Medical School has uncovered a concerning issue with artificial intelligence (AI) tools used in cancer diagnosis. These advanced systems, which analyze tissue samples to detect cancer, have shown an unexpected ability to infer patient demographics. This can lead to biased diagnostic results for certain groups, highlighting a significant challenge in the integration of AI in healthcare.
The study, published on December 16, 2025, in the journal Cell Reports Medicine, indicates that the bias observed in AI models stems from their training data and the way they interpret that information, rather than merely from a lack of representative samples. Researchers discovered that diagnostic accuracy varied significantly based on factors such as race, gender, and age, raising critical questions about equity in healthcare delivery.
Understanding the Role of AI in Pathology
Pathology has long been a cornerstone of cancer diagnosis, with pathologists examining tissue samples under a microscope to identify cancerous changes. Traditionally, this process is devoid of patient-specific information, allowing for what is assumed to be an objective evaluation. Nevertheless, the advent of AI has complicated this landscape. The new findings reveal that AI systems can inadvertently learn to associate tissue characteristics with demographic details, which can influence their diagnostic decisions.
Senior author Kun-Hsing Yu, an associate professor of biomedical informatics at the Blavatnik Institute and assistant professor of pathology at Brigham and Women’s Hospital, noted the unexpected nature of these findings. “Reading demographics from a pathology slide is thought of as a ‘mission impossible’ for a human pathologist, so the bias in pathology AI was a surprise to us,” he stated. The implications of these biases are profound, as they can directly affect diagnostic accuracy and patient outcomes.
Examining Diagnostic Disparities
Yu and his team conducted a thorough evaluation of four commonly used pathology AI models designed for cancer diagnosis. These models, which rely on large datasets of labeled pathology slides, were tested on a diverse dataset encompassing 20 different cancer types. The results revealed consistent performance gaps: the AI systems demonstrated reduced accuracy for demographic groups defined by race, gender, and age.
For instance, the models struggled with distinguishing lung cancer subtypes in African American patients and male patients. Similarly, they showed lower accuracy in classifying breast cancer subtypes among younger patients. Overall, approximately 29 percent of the diagnostic tasks analyzed exhibited disparities, prompting the researchers to investigate further.
The team identified three primary contributors to these biases. First, the uneven availability of training data meant that some demographic groups were underrepresented, complicating the AI models’ ability to learn effectively. Even when sample sizes were similar, the models still showed inferior performance for specific populations. Differences in disease incidence also played a role, as certain cancers are more prevalent in specific demographic groups, leading to higher accuracy for those populations. Furthermore, the AI systems could detect subtle molecular differences across demographics, which could mislead them in their diagnostic decisions.
Yu emphasized that these findings illustrate a need for heightened awareness in how AI systems are developed and trained. “Because we would expect pathology evaluation to be objective… when evaluating images, we don’t necessarily need to know a patient’s demographics to make a diagnosis,” he explained.
Introducing FAIR-Path: A Framework for Equity
In response to these challenges, the research team developed a new framework called FAIR-Path, utilizing an existing machine-learning technique known as contrastive learning. This approach encourages AI models to focus more on essential distinctions—such as differences between cancer types—while minimizing attention to less relevant demographic differences.
When implemented, FAIR-Path significantly reduced diagnostic disparities by approximately 88 percent. “We show that by making this small adjustment, the models can learn robust features that make them more generalizable and fairer across different populations,” said Yu. This is a promising development, indicating that meaningful bias reductions can be achieved without requiring perfectly balanced training datasets.
The research team is now collaborating with institutions worldwide to further study AI bias in pathology across varying demographics and clinical practices. They are also exploring how FAIR-Path could be adapted for scenarios with limited data, as well as the broader implications of AI-driven bias on healthcare disparities.
Ultimately, Yu and his colleagues aim to enhance pathology AI systems to support healthcare professionals. Their goal is to ensure that these systems provide fast, accurate, and equitable diagnoses for all patients. “There’s hope that if we are more aware of and careful about how we design AI systems, we can build models that perform well in every population,” he concluded.
This research highlights the critical intersection of technology and healthcare, underscoring the importance of fairness and accuracy in medical diagnostics as AI continues to play an increasingly significant role in patient care.




































