Artificial intelligence (AI), known as ChatGPT, is increasingly used in medicine to improve the diagnosis and treatment of diseases, and to avoid unnecessary screening of patients. But AI medical devices could also harm patients and worsen health inequities if they are not carefully designed, tested and used, according to an international working group including a bioethicist from the University Medical Center of Rochester.
Jonathan Herington, PhD, was a member of the Society for Nuclear Medicine and Medical Imaging’s AI Working Group, which provided recommendations on how to ethically develop and use AI-based medical devices in two articles published in the Journal of Nuclear Medicine. In short, the task force called for increased transparency about the accuracy and limitations of AI and outlined ways to ensure everyone has access to AI medical devices that work for them, regardless of race , their ethnic origin, their gender or their wealth.
Although the burden of appropriate design and testing falls on AI developers, healthcare providers are ultimately responsible for the appropriate use of AI and should not rely too much on predictions of AI when making patient care decisions.
“There should always be a human in the loop,” said Herington, assistant professor of health humanities and bioethics at URMC and one of three bioethicists added to the task force in 2021. “Clinicians should use AI as an input to their own decision-making, rather than replacing their decision-making.
This requires doctors to truly understand how a given AI medical device is intended to be used, how well it performs this task, and what its limitations are – and they must pass this knowledge on to their patients. Physicians must weigh the relative risks of false positives versus false negatives for a given situation, while accounting for structural inequalities.
When using an AI system to identify likely tumors during a PET scan, for example, healthcare providers need to know how effective the system is in identifying that specific tumor type in patients of the same sex, of the same race, ethnicity, etc., as the patient. In the question.
“What that means for the developers of these systems is that they have to be very transparent,” Herington said.
According to the working group, it is up to AI developers to make available to users accurate information about their medical device’s intended use, its clinical performance and its limitations. In particular, they recommend creating alerts directly in the device or system that inform users of the degree of uncertainty of the AI predictions. This might look like heat maps on cancer scans that show whether areas are more or less likely to be cancerous.
To minimize this uncertainty, developers should carefully define the data they use to train and test their AI models, and should use clinically relevant criteria to evaluate model performance. It is not enough to simply validate the algorithms used by a device or system. AI-based medical devices would need to be tested in so-called “silent trials,” meaning their performance would be evaluated by researchers on real patients in real time, but their predictions would not be available to the health care provider or applied to clinical decision making.
Developers must also design AI models that are useful and accurate in all contexts in which they will be deployed.
“One concern is that these high-tech and expensive systems would be deployed in highly resourced hospitals and improve outcomes for relatively well-advantaged patients, while patients in underfunded or rural hospitals would not have access to them – or would have access to systems that make their care worse because they were not designed for them,” Herington said.
Currently, AI medical devices are trained on datasets in which Latino and Black patients are underrepresented, meaning the devices are less likely to make accurate predictions for patients in these groups. To avoid worsening health inequities, developers must ensure that their AI models are calibrated for all racial and gender groups by training them with datasets that represent all populations that the device or the medical system will ultimately serve.
Although these recommendations were developed with a focus on nuclear medicine and medical imaging, Herington believes they can and should be applied broadly to AI medical devices.
“Systems are becoming more and more powerful and the landscape is changing very quickly,” Herington said. “We have a rapidly closing window to solidify our ethical and regulatory framework around these things.”