While artificial intelligence has the potential to make healthcare more accessible and efficient, it also is vulnerable to the social, economic and systemic biases that have been entrenched in society for generations.
The first step to keeping AI from amplifying existing inequalities is understanding the bias that can creep into algorithms and how to prevent it through careful design and implementation.
Healthcare IT News interviewed AI expert Henk van Houten, chief technology officer at global IT vendor Royal Philips, to get a better understanding of bias in AI and what the healthcare industry can do about it.
Q: What are the different ways bias can arise in healthcare AI?
A: Let me start by saying that I strongly believe AI has the potential to improve people’s health and wellbeing around the world. Every patient should benefit from that. The last thing we want is for AI to perpetuate or even exacerbate some of the health disparities that exist today.
So how can bias arise? What can be easy to forget is that AI’s output is shaped by the data that is fed into it. We tend to take computer-based recommendations at face value, assuming that whatever output an AI algorithm portrays is objective and impartial. The truth is, humans choose the data that goes into an algorithm, which means these choices are still subject to unintentional biases that can negatively impact underrepresented groups.
These biases can occur at any phase of AI development and deployment, whether it’s using biased datasets to build an algorithm, or applying an algorithm in a different context than the one it was originally intended for.
The most common source of bias is data that doesn’t sufficiently represent the target population. This can have adverse implications for certain groups. For example, women and people of color are typically underrepresented in clinical trials. As others have pointed out, if algorithms analyzing skin images were trained on images of white patients, but are now applied more broadly, they could potentially miss malignant melanomas in people of color.
Take COVID-19 as another example. Let’s say you have an algorithm that is designed to prioritize care for COVID-19 patients. This could put populations lacking access to COVID-19 testing at a disadvantage, because if those populations are underrepresented in the training data, the algorithm may fail to factor in their needs and characteristics.
Even if the data itself is free of bias, choices made during data processing and algorithm development can also contribute to bias. For example, differences between populations may be neglected in order to develop a one-size-fits-all model. We know that many diseases present differently in women and men, whether it’s cardiovascular disease, diabetes or mental health disorders such as depression and autism. If algorithms don’t take such differences into account, they could magnify existing gender inequalities.
Q: How can healthcare provider organization CIOs and related health IT leaders best embrace fairness as a guiding principle for the responsible use of AI?
A: Public and private industry players are understanding the need to address these issues. In the U.S., we will likely see a new push for the Algorithmic Accountability Act, which would require companies to assess their AI systems for risks of unfair, biased or discriminatory outputs. We see similar regulation being developed in Europe and in China, with guidelines to ensure AI is trustworthy and to control the risk of bias.
Companies must work to further embed these ideals and guiding principles into their workforce, and the first step toward embracing fairness is awareness. When it comes to data science, more education and training is needed for how bias can arise in various stages of algorithm development and how it can be prevented or mitigated.
It’s up to CIOs and health IT leaders to prioritize these learnings for their staff and, furthermore, to make sure diversity is built into every aspect of AI development, and that sufficient processes are put in place to monitor algorithms’ purpose, data quality and performance.
Q: You suggest building three types of diversity into every aspect of AI development: diversity in people, diversity in data and diversity in validation. How do health IT leaders do this?
A: Let me expand on each.
First, diversity in people. We need to make sure that the people working on AI algorithms reflect the diversity of the world we live in. In a field that has historically been led by white male developers, we need to make every effort to encourage a more inclusive culture. It’s equally important that we strive for true multidisciplinary cooperation to harness the complementary strengths of different specialties.
This means fostering intense collaboration between AI developers and clinical experts to combine AI capabilities with a deep contextual understanding of patient care. For example, when there are known variations in disease manifestation between genders or different ethnic groups, clinicians can help validate whether algorithmic recommendations are not inadvertently harming specific groups.
To complement that expertise, statisticians and methodologists who have a keen understanding of bias and mitigation strategies are critically valuable to AI development teams.
Second, diversity in data. Limited availability of high-quality data can be one of the biggest hurdles in developing AI that accurately represents a population. To promote the development of fair AI, we should aggregate robust, well-annotated and curated datasets across institutions in a way that protects patient privacy and captures diversity between and within demographic groups.
For example, the Philips eICU Research Institute was formed as a platform to combine de-identified data from more than 400 participating ICUs in the U.S., and the resulting data repository has been used to develop critical care AI tools, including an algorithm that helps decide whether a patient is ready to be discharged from the ICU.
In the face of COVID-19, researchers also have pushed for broader sharing of patient data across institutions and even countries to ensure that clinical decision support algorithms are developed from diverse and representative data sets, rather than from limited convenience samples.
And third, diversity in validation. Developed algorithms require thorough validation to ensure they perform as intended on the entire target population. This means they need to be assessed using not only traditional accuracy metrics, but also relevant fairness metrics.
Algorithms may need to be retrained and recalibrated when applied to patients from different countries or ethnicities, or even when they are used in different hospitals in the same country. For example, when validating our own research based on the eICU data repository, we noticed that algorithms derived from multi-hospital data performed well in other U.S. hospitals not included originally.
But when we tested some of our U.S.-developed eICU research algorithms in China and India, we found that local retraining was required. Careful scrutiny and monitoring is always needed.
Q: How can health IT leaders develop robust quality management systems for monitoring and documenting an algorithm’s purpose, data quality, development process and performance?
A: There are several elements worth considering here, and I don’t think any organization, Philips included, has mastered them all yet. One element is to ensure that every dataset and algorithm comes with proper documentation on its provenance, scope, limitations and potential biases.
In addition, I expect an increased emphasis on incorporating fairness metrics into AI development, as I already mentioned. Stimulating the use of statistical tools for bias analysis and mitigation can be another way of getting fairness top of mind.
One area that deserves attention in particular is self-learning AI. An algorithm could go through all necessary validation and fairness checks before implementation, but as it learns from new data in hospitals where it is implemented, how do we ensure that bias does not unintentionally creep in?
As regulators have also recognized, continuous monitoring after market introduction will be necessary to ensure fair and bias-free performance. Ideally this would include finding a way to validate the representativeness of new learning data – in a way that respects ethical, legal and regulatory boundaries around the use of sensitive personal data.
Ultimately, much of it also comes down to creating a strong and open company culture where people feel comfortable challenging each other on questions of bias and fairness. AI has the potential to make healthcare more accessible, affordable and effective, and that’s something to be excited about. But we need to make sure that every patient gets to benefit from it in equal measure – whatever their gender, ethnicity or origin.